Skip to main content
Top

2024 | Book

Neural Information Processing

30th International Conference, ICONIP 2023, Changsha, China, November 20–23, 2023, Proceedings, Part V

Editors: Biao Luo, Long Cheng, Zheng-Guang Wu, Hongyi Li, Chaojie Li

Publisher: Springer Nature Singapore

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The six-volume set LNCS 14447 until 14452 constitutes the refereed proceedings of the 30th International Conference on Neural Information Processing, ICONIP 2023, held in Changsha, China, in November 2023.
The 652 papers presented in the proceedings set were carefully reviewed and selected from 1274 submissions. They focus on theory and algorithms, cognitive neurosciences; human centred computing; applications in neuroscience, neural networks, deep learning, and related fields.

Table of Contents

Frontmatter
Correction to: High-Resolution Self-attention with Fair Loss for Point Cloud Segmentation
Qiyuan Liu, Jinzheng Lu, Qiang Li, Bingsen Huang

Applications

Frontmatter
Text to Image Generation with Conformer-GAN

Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network (CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 $$\times $$ × 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.

Zhiyu Deng, Wenxin Yu, Lu Che, Shiyu Chen, Zhiqiang Zhang, Jun Shang, Peng Chen, Jun Gong
MGFNet: A Multi-granularity Feature Fusion and Mining Network for Visible-Infrared Person Re-identification

Visible-infrared person re-identification (VI-ReID) aims to match the same pedestrian in different forms captured by the visible and infrared cameras. Existing works on retrieving pedestrians focus on mining the shared feature representations by the deep convolutional neural networks. However, there are limitations of single-granularity for identifying target pedestrians in complex VI-ReID tasks. In this study, we propose a new Multi-Granularity Feature Fusion and Mining Network (MGFNet) to fuse and mine the feature map information of the network. The network includes a Local Residual Spatial Attention (LRSA) module and a Multi-Granularity Feature Fusion and Mining (MGFM) module to jointly extract discriminative features. The LRSA module aims to guide the network to learn fine-grained features that are useful for discriminating and generating more robust feature maps. Then, the MGFM module is employed to extract and fuse pedestrian features at both global and local levels. Specifically, a new local feature fusion strategy is designed for the MGFM module to identify subtle differences between various pedestrian images. Extensive experiments on two mainstream datasets, SYSU-MM01 and RegDB, show that the MGFNet outperforms the existing techniques.

BaiSheng Xu, HaoHui Ye, Wei Wu
Isomorphic Dual-Branch Network for Non-homogeneous Image Dehazing and Super-Resolution

Removing non-homogeneous haze from real-world images is a challenging task. Meanwhile, the popularity of high-definition imaging systems and compute-limited smart mobile devices has resulted in new problems, such as the high computational load caused by haze removal for large-size images, or the severe information loss caused by the degradation of both the haze and image downsampling, when applying existing dehazing methods. To address these issues, we propose an isomorphic dual-branch dehazing and super-resolution network for non-homogeneous dehazing of a downsampled hazy image, which produces dehazed and enlarged images with sharp edges and high color fidelity. We quantitatively and qualitatively compare our network with several state-of-the-art dehazing methods under the condition of different downsampling scales. Extensive experimental results demonstrate that our method achieves superior performance in terms of both the quality of output images and the computational load.

Wenqing Kuang, Zhan Li, Ruijin Guan, Weijun Yuan, Ruting Deng, Yanquan Chen
Hi-Stega: A Hierarchical Linguistic Steganography Framework Combining Retrieval and Generation

Due to the widespread use of social media, linguistic steganography which embeds secret message into normal text to protect the security and privacy of secret message, has been widely studied and applied. However, existing linguistic steganography methods ignore the correlation between social network texts, resulting in steganographic texts that are isolated units and prone to breakdowns in cognitive-imperceptibility. Moreover, the embedding capacity of text is also limited due to the fragmented nature of social network text. In this paper, in order to make the practical application of linguistic steganography in social network environment, we design a hierarchical linguistic steganography (Hi-Stega) framework. Combining the benefits of retrieval and generation steganography method, we divide the secret message into data information and control information by taking advantage of the fact that social network contexts are associative. The data information is obtained by retrieving the secret message in normal network text corpus and the control information is embedded in the process of comment or reply text generation. The experimental results demonstrate that the proposed approach achieves higher embedding payload while the imperceptibility and security can also be guaranteed. (All datasets and codes used in this paper are released at https://github.com/wanghl21/Hi-Stega .)

Huili Wang, Zhongliang Yang, Jinshuai Yang, Yue Gao, Yongfeng Huang
Effi-Seg: Rethinking EfficientNet Architecture for Real-Time Semantic Segmentation

A popular strategy for designing a semantic segmentation model is to utilize a well-established pre-trained Deep Convolutional Neural Network (DCNN) as a feature extractor and replace the classification head with a decoder to generate segmented outputs. The advantage of this strategy is the ability to obtain a ready-made backbone with additional knowledge. However, there are several disadvantages, such as a lack of architectural knowledge, a significant semantic gap among the deep feature maps, and a lack of control over architectural changes to reduce memory overhead. To overcome these issues, we first study the complete architecture of EfficientNetV1 and EfficientNetV2, analyzing the architectural and performance gaps. Based on this analysis, we develop an efficient segmentation model called Effi-Seg by implementing several architectural changes to the backbone. This approach leads to better semantic segmentation results with improved efficiency. To enhance contextualization and achieve accurate object localization in the scene, we introduce a feature refinement module (FRM) and a semantic aggregation module (SAM) in the decoder. The complete segmentation network comprises only 1.49 million parameters and 8.4 GFLOPs. We evaluate the performance of the proposed model using three popular benchmarks, and it demonstrates highly competitive results on all three datasets while maintaining excellent efficiency.

Tanmay Singha, Duc-Son Pham, Aneesh Krishna
Quantum Autoencoder Frameworks for Network Anomaly Detection

Detecting anomalous activities in network traffic is important for the timely identification of emerging cyber-attacks. Accurate analysis of the emerging patterns in the network traffic is critical to identify suspicious behaviors. This paper proposes novel quantum deep autoencoder-based anomaly detection frameworks for accurately detecting the security attacks that emerge in the network. In particular, we propose three frameworks, one by constructing several reconstruction error thresholds-based methods; second, a union of a quantum autoencoder and a one-class support vector machine-based method; and third a union of a quantum autoencoder and quantum random forest-based method. The quantum frameworks’ effectiveness in accurately detecting the attacks is evaluated using a publicly available benchmark dataset. Our empirical evaluations demonstrate the improvements in accuracy and F1-score for the three frameworks.

Moe Hdaib, Sutharshan Rajasegarar, Lei Pan
Spatially-Aware Human-Object Interaction Detection with Cross-Modal Enhancement

We propose a novel two-stage HOI detection model that incorporates cross-modal spatial information awareness. Human-object relative spatial relationships are highly relevant for specific HOI species, but current approaches fail to model such crucial cues explicitly. We observed that relative spatial relationships possess properties that can be described in natural language easily and intuitively. Building on this observation and inspired by recent advancements in prompt-tuning, we design a Prompt-Enhanced Spatial Modeling (PESM) module that generates linguistic descriptions of spatial relations between humans and objects. PESM is capable of merging the explicit spatial information obtained by the aforementioned text descriptions with the implicit spatial information of the visual modality. Moreover, we devise a two-stage model architecture that effectively incorporates auxiliary cues to exploit the enhanced cross-modal spatial information. Extensive experiments conducted on the HICO-DET benchmark demonstrate that the proposed model outperforms state-of-the-art methods, indicating its effectiveness and superiority. The source code is available at https://github.com/liugaowen043/tsce .

Gaowen Liu, Huan Liu, Caixia Yan, Yuyang Guo, Rui Li, Sizhe Dang
Intelligent Trajectory Tracking Control of Unmanned Parafoil System Based on SAC Optimized LADRC

The unmanned parafoil system has become increasingly popular in a variety of military and civilian applications due to its remarkable carrying capacity, as well as its capacity to modify its flight path by adjusting the left or right paracord. To ensure the unmanned system’s safe completion of its flight mission, precise trajectory tracking control of the parafoil is essential. This paper presents an intelligent trajectory tracking approach that employs a soft actor-critic (SAC) algorithm optimized linear active disturbance rejection control (LADRC). Using the eight-degree-of-freedom (DOF) parafoil model as a basis, we have developed a trajectory tracking guidance law to address the underactuated problem. To ensure that the system’s yaw angle accurately tracks the guided yaw angle, we have designed a second-order LADRC. Additionally, SAC algorithm is used to obtain adaptive parameters for the controller, ultimately enhancing tracking performance. Simulation results show that the proposed method can overcome the wind disturbance and achieve the convergence of tracking errors.

Yuemin Zheng, Jin Tao, Qinglin Sun, Jinshan Yang, Hao Sun, Mingwei Sun, Zengqiang Chen
CATS: Connection-Aware and Interaction-Based Text Steganalysis in Social Networks

The generative linguistic steganography in social networks have potential huge abuse and regulatory risks, with serious implications for information security, especially in the era of large language models. Many works have explored detecting steganographic texts with progressively enhanced imperceptibility, but they can only achieve poor performance in real social network scenarios. One key reason is that these methods primarily focus on linguistic features, which are extremely insufficient owing to the fragmentation of social texts. In this paper, we propose a novel method called CATS (Connection-aware and interAction-based Text Steganalysis) to effectively detected the potentially malicious steganographic texts. CATS captures social networks connection information by graph representation learning, enhances linguistic features by contrastive learning and fully integrates features above via a novel features interaction module. Our experimental results demonstrate that CATS outperforms existing methods by exploiting social network graph structure features and interactions in social network environments.

Kaiyi Pang, Jinshuai Yang, Yue Gao, Minhao Bai, Zhongliang Yang, Minghu Jiang, Yongfeng Huang
Syntax Tree Constrained Graph Network for Visual Question Answering

Visual Question Answering (VQA) aims to automatically answer natural language questions related to given image content. Existing VQA methods integrate vision modeling and language understanding to explore the deep semantics of the question. However, these methods ignore the significant syntax information of the question, which plays a vital role in understanding the essential semantics of the question and guiding the visual feature refinement. To fill the gap, we suggested a novel Syntax Tree Constrained Graph Network (STCGN) for VQA based on entity message passing and syntax tree. This model is able to extract a syntax tree from questions and obtain more precise syntax information. Specifically, we parse questions and obtain the question syntax tree using the Stanford syntax parsing tool. From the word level and phrase level, syntactic phrase features and question features are extracted using a hierarchical tree convolutional network. We then design a message-passing mechanism for phrase-aware visual entities and capture entity features according to a given visual context. Extensive experiments on VQA2.0 datasets demonstrate the superiority of our proposed model.

Xiangrui Su, Qi Zhang, Chongyang Shi, Jiachang Liu, Liang Hu
CKR-Calibrator: Convolution Kernel Robustness Evaluation and Calibration

Recently, Convolution Neural Networks (CNN) have achieved excellent performance in some areas of computer vision, including face recognition, character recognition, and autonomous driving. However, there are still many CNN-based models that cannot be deployed in real-world scenarios due to poor robustness. In this paper, focusing on the classification task, we attempt to evaluate and optimize the robustness of CNN-based models from a new perspective: the convolution kernel. Inspired by the discovery that the root cause of the model decision error lies in the wrong response of the convolution kernel, we propose a convolution kernel robustness evaluation metric based on the distribution of convolution kernel responses. Then, we devise the Convolution Kernel Robustness Calibrator, termed as CKR-Calibrator, to optimize key but not robust convolution kernels. Extensive experiments demonstrate that CKR-Calibrator improves the accuracy of existing CNN classifiers by 1%–4% in clean datasets and 1%–5% in corrupt datasets, and improves the accuracy by about 2% over SOTA methods. The evaluation and calibration source code is open-sourced at https://github.com/cym-heu/CKR-Calibrator .

Yijun Bei, Jinsong Geng, Erteng Liu, Kewei Gao, Wenqi Huang, Zunlei Feng
SGLP-Net: Sparse Graph Label Propagation Network for Weakly-Supervised Temporal Action Localization

The present weakly-supervised methods for Temporal Action Localization are primarily responsible for capturing the temporal context. However, these approaches have limitations in capturing semantic context, resulting in the risk of ignoring snippets that are far apart but sharing the same action categories. To address this issue, we propose an action label propagation network utilizing sparse graph networks to effectively explore both temporal and semantic information in videos. The proposed SGLP-Net comprises two key components. One is the multi-scale temporal feature embedding module, a novel method that extracts both local and global temporal features of the videos during the initial stage using CNN and self-attention and serves as a generic module. The other is an action label propagation mechanism, which uses graph networks for feature aggregation and label propagation. To avoid the issue of excessive feature completeness, we optimize training using sparse graph convolutions. Extensive experiments are conducted on THUMOS14 and ActivityNet1.3 benchmarks, among which advanced results demonstrate the superiority of the proposed method. Code can be found at https://github.com/xyao-wu/SGLP-Net .

Xiaoyao Wu, Yonghong Song
VFIQ: A Novel Model of ViT-FSIMc Hybrid Siamese Network for Image Quality Assessment

The Image Quality Assessment (IQA) is to measure how humans perceive the quality of images. In this paper, we propose a new model named for VFIQ – a ViT-FSIMc Hybrid Siamese Network for Full Reference IQA – that combines signal processing and leaning-based approaches, the two categories of IQA algorithms. Specifically, we design a hybrid Siamese network that leverages the Vision Transformer (ViT) and the feature similarity index measurement (FSIMc). To evaluate the performance of the proposed VFIQ model, we first pre-train the ViT module on the PIPAL dataset, and then evaluate our VFIQ model on several popular benchmark datasets including TID2008, TID2013, and LIVE. The experiment results show that our VFIQ model outperforms the state-of-the-art IQA models in the commonly used correlation metrics of PLCC, KRCC, and SRCC. We also demonstrate the usefulness of our VFIQ model in different vision tasks, such as image recovery and generative model evaluation.

Junrong Huang, Chenwei Wang
Spiking Reinforcement Learning for Weakly-Supervised Anomaly Detection

Weakly-supervised Anomaly Detection (AD) has achieved significant performance improvement compared to unsupervised methods by harnessing very little additional labeling information. However, most existing methods ignore anomalies in unlabeled data by simply treating the whole unlabeled set as normal; that is, they fail to resist such noise that may considerably disturb the learning process, and more importantly, they cannot extract key anomaly features from these unlabeled anomalies, which are complementary to those labeled ones. To solve this problem, a spiking reinforcement learning framework for weakly-supervised AD is proposed, named ADSD. Compared with artificial neural networks, the spiking neural network can effectively resist input perturbations due to its unique coding methods and neuronal characteristics. From this point of view, by using spiking neurons with noise filtering and threshold adaptation, as well as a multi-weight evaluation method to discover the most suspicious anomalies in unlabeled data, ADSD achieves end-to-end optimization for the utilization of a few labeled anomaly data and rare unlabeled anomalies in complex environments. The agent in ADSD has robustness and adaptability when exploring potential anomalies in the unknown space. Extensive experiments show that our method ADSD significantly outperforms four popular baselines in various environments while maintaining good robustness and generalization performance.

Ao Jin, Zhichao Wu, Li Zhu, Qianchen Xia, Xin Yang
Resource-Aware DNN Partitioning for Privacy-Sensitive Edge-Cloud Systems

With recent advances in deep neural networks (DNNs), there is a significant increase in IoT applications leveraging AI with edge-cloud infrastructures. Nevertheless, deploying large DNN models on resource-constrained edge devices is still challenging due to limitations in computation, power, and application-specific privacy requirements. Existing model partitioning methods, which deploy a partial DNN on an edge device while processing the remaining portion of the DNN on the cloud, mainly emphasize communication and power efficiency. However, DNN partitioning based on the privacy requirements and resource budgets of edge devices has not been sufficiently explored in the literature. In this paper, we propose awareSL, a model partitioning framework that splits DNN models based on the computational resources available on edge devices, preserving the privacy of input samples while maintaining high accuracy. In our evaluation of multiple DNN architectures, awareSL effectively identifies the split points that adapt to resource budgets of edge devices. Meanwhile, we demonstrate the privacy-preserving capability of awareSL against existing input reconstruction attacks without sacrificing inference accuracy in image classification tasks.

Aolin Ding, Amin Hass, Matthew Chan, Nader Sehatbakhsh, Saman Zonouz
A Frequency Reconfigurable Multi-mode Printed Antenna

A multi-frequency reconfigurable antenna is proposed. The designed antenna can be electronically tuned to achieve the tuning operation in the 2.4 GHz band defined by the IEEE 802.11b standard and the ultra-wideband (UWB) low frequency. An L-shaped branch and a polygon patch are used as the main radiators of the antenna. Two varactor diodes are mounted on the slots and one PIN diode is mounted on the L-shaped branch to vary the effective electrical length of the antenna. The simulation and measurement results match well. With high frequency reconfiguration stability under guaranteed miniaturization, good impedance matching (S11 > −10 dB) is obtained in several operating bands, and the overall impedance bandwidth covers 2.34–2.58 GHz and 3.11–5.14 GHz. It provides solutions for operation within WiMax, WLAN, and 5G-sub6 GHz.

Yanbo Wen, Huiwei Wang, Menggang Chen, Yawei Shi, Huaqing Li, Chuandong Li
Multi-view Contrastive Learning for Knowledge-Aware Recommendation

Knowledge-aware recommendation has attracted increasing attention due to its wide application in alleviating data-sparse and cold-start, but the real-world knowledge graph (KG) contains many noises from irrelevant entities. Recently, contrastive learning, a self-supervised learning (SSL) method, has shown excellent anti-noise performance in recommendation task. However, the inconsistency between the use of noisy embeddings in SSL tasks and the original embeddings in recommendation tasks limits the model’s ability.We propose a Multi-view Contrastive learning for Knowledge-aware Recommendation framework (MCKR) to solve the above problems. To remove inconsistencies, MCKR unifies the input of SSL and recommendation tasks and learns more representations from the contrastive learning method. To alleviate the noises from irrelevant entities, MCKR preprocesses the KG triples according to the type and randomly perturbs of graph structure with different weights. Then, a novel distance-based graph convolutional network is proposed to learn more reliable entity information in KG. Extensive experiments on three popular benchmark datasets present that our approach achieves state-of-the-art. Further analysis shows that MCKR also performs well in reducing data noise.

Ruiguo Yu, Zixuan Li, Mankun Zhao, Wenbin Zhang, Ming Yang, Jian Yu
PYGC: A PinYin Language Model Guided Correction Model for Chinese Spell Checking

Chinese Spell Checking (CSC) is an NLP task that detects and corrects erroneous characters in Chinese texts. Since people often use pinyin (pronunciation of Chinese characters) input methods or speech recognition to type text, most of these errors are misuse of phonetically or semantically similar characters. Previous attempts fuse pinyin information into the embedding layer of pre-trained language models. However, although they can learn from phonetic knowledge, they can not make good use of this knowledge for error correction. In this paper, we propose a PinYin language model Guided Correction model (PYGC), which regards the Chinese pinyin sequence as an independent language model. Our model builds on two parallel transformer encoders to capture pinyin and semantic features respectively, with a late fusion module to fuse these two hidden representations to generate the final prediction. Besides, we perform an additional pronunciation prediction task on pinyin embeddings to ensure the reliability of the pinyin language model. Experiments on the widely used SIGHAN benchmark and a newly released CSCD-IME dataset with mainly pinyin-related errors show that our method outperforms current state-of-the-art approaches by a remarkable margin. Furthermore, isolation tests demonstrate that our model has the best generalization ability on unseen spelling errors. (Code available in https://github.com/Imposingapple/PYGC ).

Haoping Chen, Xukai Wang
Empirical Analysis of Multi-label Classification on GitterCom Using BERT and ML Classifiers

To maintain development consciousness, simplify project coordination, and prevent misinterpretation, communication is essential for software development teams. Instant private messaging, group chats, and sharing code are just a few of the capabilities that chat rooms provide to assist and meet the communication demands of software development teams. All of this is capacitated to happen in real-time. Consequently, chat rooms have gained popularity among developers. Gitter is one of these platforms that has gained popularity, and the conversations it contains may be a treasure trove of data for academics researching open-source software systems. This research made use of the GitterCom dataset, The largest collection of Gitter developer messages that have been carefully labelled and curated and perform multi-label classification for the ’Purpose’ category in the dataset. An extensive empirical analysis is performed on 6 feature selection techniques, 14 machine learning classifiers, and BERT transformer layer architecture with layer-by-layer comparison. Consequently, we achieve proficient results through our research pipeline involving Extra Trees Classifier and Random Forest classifiers with AUC (OvR) median performance of 0.94 and 0.92 respectively. Furthermore, The research proposed research pipeline could be utilized for generic multi-label text classification on software developer forum text data.

Bathini Sai Akash, Lov Kumar, Vikram Singh, Anoop Kumar Patel, Aneesh Krishna
A Lightweight Safety Helmet Detection Network Based on Bidirectional Connection Module and Polarized Self-attention

Safety helmets worn by construction workers in substations can reduce the accident rate in construction operations. With the mature development of smart grid and target detection technology, automatic monitoring of helmet wearing by using the cloud-side collaborative approach is of great significance in power construction safety management. However, existing target detectors have a large number of redundant calculations in the process of multi-scale feature fusion, resulting in additional computational overhead for the detectors. To solve this problem, we propose a lightweight target detection model PFBDet. First, we design cross-stage local bottleneck module FNCSP, and propose an efficient lightweight feature extraction network PFNet based on this combined with Polarized Self-Attention to optimize the computational complexity while obtaining more feature information. Secondly, to address the redundancy overhead brought by multi-scale feature fusion, we design BCM (bidirectional connection module) based on GSConv and lightweight upsampling operator CARAFE, and propose an efficient multi-scale feature fusion structure BCM-PAN based on this combined with single aggregation cross-layer network module. To verify the effectiveness of the method, we conducted extensive experiments on helmet image datasets such as Helmeted, Ele-hat and SHWD, and the experimental results show that the proposed method has better recognition accuracy with less computational effort. And it is higher than most high-performance target detectors, which can meet the real-time detection of construction personnel wearing helmets in the construction scenarios of power systems.

Tianyang Li, Hanwen Xu, Jinxu Bai
Direct Inter-Intra View Association for Light Field Super-Resolution

Light field (LF) cameras record both intensity and directions of light rays in a scene with a single exposure. However, due to the inevitable trade-off between spatial and angular dimensions, the spatial resolution of LF images is limited which makes LF super-resolution (LFSR) a research hotspot. The key of LFSR is the complementation across views and the extraction of high-frequency information inside each view. Due to the high-dimensinality of LF data, previous methods usually model these two processes separately, which results in insufficient inter-view information fusion. In this paper, LF Transformer is proposed for comprehensive perception of 4D LF data. Necessary inter-intra view correlations can be directly established inside each LF Transformer block. Therefore it can handle complex disparity variations of LF. Then based on LF Transformers, 4DTNet is designed which comprehensively performs inter-intra view high-frequency information extraction. Extensive experiments on public datasets demonstrate that 4DTNet outperforms the current state-of-the-art methods both numerically and visually.

Da Yang, Hao Sheng, Shuai Wang, Rongshan Chen, Zhang Xiong
Responsive CPG-Based Locomotion Control for Quadruped Robots

Quadruped robots with flexible movement are gradually replacing traditional mobile robots in many spots. To improve the motion stability and speed of the quadruped robot, this paper presents a responsive gradient CPG (RG-CPG) approach. Specifically, the method introduces a vestibular sensory feedback mechanism into the gradient CPG (central pattern generators) model and uses a differential evolution algorithm to optimize the vestibular sensory feedback parameters. Simulation results show that the movement stability and linear movement velocity of the quadruped robot controlled by RG-CPG are effectively improved, and the quadruped robot can cope with complex terrains. Prototype experiments demonstrate that RG-CPG works for real quadruped robots.

Yihui Zhang, Cong Hu, Binbin Qiu, Ning Tan
Vessel Behavior Anomaly Detection Using Graph Attention Network

Vessel behavior anomaly detection is of great significance for ensuring navigation safety, combating maritime crimes, and maritime management. Unfortunately, most current researches ignore the temporal dependencies and correlations between ship features. We propose a novel vessel behavior anomaly detection using graph attention network (i.e., VBAD-GAT) framework, which characterizes these complicated relationships and dependencies through a graph attention module that consists of a time graph attention module and a feature graph attention module. We also adopt a process of graph structure learning to obtain the correct feature graph structure. Moreover, we propose a joint detection strategy combining reconstruction and prediction modules to capture the local ship features and long-term relationships between ship features. We demonstrate the effectiveness of the graph attention module and the joint detection strategy through the ablation study. In addition, the comparative experiments with three baselines, including the quantitative analysis and visualization, show that VBAD-GAT outperforms all other baselines.

Yuanzhe Zhang, Qiqiang Jin, Maohan Liang, Ruixin Ma, Ryan Wen Liu
TASFormer: Task-Aware Image Segmentation Transformer

In image segmentation tasks for real-world applications, the number of semantic categories can be very large, and the number of objects in them can vary greatly. In this case, the multi-channel representation of the output mask for the segmentation model is inefficient. In this paper we explore approaches to overcome such a problem by using a single-channel output mask and additional input information about the desired class for segmentation. We call this information task embedding and we learn it in the process of the neural network model training. In our case, the number of tasks is equal to the number of segmentation categories. This approach allows us to build universal models that can be conveniently extended to an arbitrary number of categories without changing the architecture of the neural network. To investigate this idea we developed a transformer neural network segmentation model named TASFormer. We demonstrated that the highest quality results for task-aware segmentation are obtained using adapter technology as part of the model. To evaluate the quality of segmentation, we introduce a binary intersection over union (bIoU) metric, which is an adaptation of the standard mIoU for the models with a single-channel output. We analyze its distinguishing properties and use it to compare modern neural network methods. The experiments were carried out on the universal ADE20K dataset. The proposed TASFormer-based approach demonstrated state-of-the-art segmentation quality on it. The software implementation of the TASFormer method and the bIoU metric is publicly available at www.github.com/subake/TASFormer .

Dmitry Yudin, Aleksandr Khorin, Tatiana Zemskova, Darya Ovchinnikova
Unsupervised Joint-Semantics Autoencoder Hashing for Multimedia Retrieval

Cross-modal hashing has emerged as a prominent approach for large-scale multimedia information retrieval, offering advantages in computational speed and storage efficiency over traditional methods. However, unsupervised cross-modal hashing methods still face challenges in the lack of practical semantic labeling guidance and handling of cross-modal heterogeneity. In this paper, we propose a new unsupervised cross-modal hashing method called Unsupervised Joint-Semantics Autoencoder Hashing(UJSAH) for multimedia retrieval. First, we introduce a joint-semantics similarity matrix that effectively preserves the semantic information in multimodal data. This matrix integrates the original neighborhood structure information of the data, allowing it to better capture the associations between different modalities. This ensures that the similarity matrix can accurately mine the underlying relationships within the data. Second, we design a dual prediction network-based autoencoder, which implements the interconversion of semantic information from different modalities and ensures that the generated binary hash codes maintain the semantic information of different modalities. Experimental results on several classical datasets show a significant improvement in the performance of UJSAH in multimodal retrieval tasks relative to existing methods. The experimental code is published at https://github.com/YunfeiChenMY/UJSAH .

Yunfei Chen, Jun Long, Yinan Li, Yanrui Wu, Zhan Yang
TKGR-RHETNE: A New Temporal Knowledge Graph Reasoning Model via Jointly Modeling Relevant Historical Event and Temporal Neighborhood Event Context

Temporal knowledge graph reasoning (TKGR) has been of great interest for its role in enriching the naturally incomplete temporal knowledge graph (TKG) by uncovering new events from existing ones with temporal information. At present, the majority of existing TKGR methods have attained commendable performance. Nevertheless, they still suffer from several problems, specifically their limited ability to adeptly capture intricate long-term event dependencies within the context of pertinent historical events, as well as to address the occurrence of an event with insufficient historical information or be influenced by other events. To alleviate such issues, we propose a novel TKGR method named TKGR-RHETNE, which jointly models the context of relevant historical events and temporal neighborhood events. In terms of the historical event view, we introduce an encoder based on the transformer Hawkes process and self-attention mechanism to effectively capture long-term event dependencies, thus modeling the event evolution process continuously. In terms of the neighborhood event view, we propose a neighborhood aggregator to model the potential influence between events with insufficient historical information and other events, which is implemented by integrating the random walk strategy with the TKG topological structure. Comprehensive experiments on five benchmark datasets demonstrate the superior performance of our proposed model (Code is publicly available at https://github.com/wanwano/TKGR-RHETNE ).

Jinze Sun, Yongpan Sheng, Ling Zhan, Lirong He
High-Resolution Self-attention with Fair Loss for Point Cloud Segmentation

Applying deep learning techniques to analyze point cloud data has emerged as a prominent research direction. However, the insufficient spatial and feature information integration within point cloud and unbalanced classes in real-world datasets have hindered the advancement of research. Given the success of self-attention mechanisms in numerous domains, we apply the High-Resolution Self-Attention (HRSA) module as a plug-and-play solution for point cloud segmentation. The proposed HRSA module preserve high-resolution internal representations in both spatial and feature dimensions. Additionally, by affecting the gradient of dominant and weak classes, we introduce the Fair Loss to address the problem of unbalanced class distribution on a real-world dataset to improve the network’s inference capabilities. The introduced modules are seamlessly integrated into an MLP-based architecture tailored for large-scale point cloud processing, resulting in a new segmentation network called PointHR. PointHR achieves impressive performance with mIoU scores of 69.8% and 74.5% on S3DIS Area-5 and 6-fold cross-validation. With a significantly smaller number of parameters, these performances make PointHR highly competitive in point cloud semantic segmentation.

Qiyuan Liu, Jinzheng Lu, Qiang Li, Bingsen Huang
Transformer-Based Video Deinterlacing Method

Deinterlacing is a classical issue in video processing, aimed at generating progressive video from interlaced content. There are precious videos that are difficult to reshoot and still contain interlaced content. Previous methods have primarily focused on simple interlaced mechanisms and have struggled to handle the complex artifacts present in real-world early videos. Therefore, we propose a Transformer-based method for deinterlacing, which consists of a Feature Extractor, a De-Transformer, and a Residual DenseNet module. By incorporating self-attention in Transformer, our proposed method is able to better utilize the inter-frame movement correlation. Additionally, we combine a properly designed loss function and residual blocks to train an end-to-end deinterlacing model. Extensive experimental results on various video sequences demonstrate that our proposed method outperforms state-of-the-art methods in different tasks by up to 1.41 $$\sim $$ ∼ 2.64dB. Furthermore, we also discuss several related issues, such as the rationality of the network structure. The code for our proposed method is available at https://github.com/Anonymous2022-cv/DeT.git .

Chao Song, Haidong Li, Dong Zheng, Jie Wang, Zhaoyi Jiang, Bailin Yang
SCME: A Self-contrastive Method for Data-Free and Query-Limited Model Extraction Attack

Previous studies have revealed that artificial intelligence (AI) systems are vulnerable to adversarial attacks. Among them, model extraction attacks fool the target model by generating adversarial examples on a substitute model. The core of such an attack is training a substitute model as similar to the target model as possible, where the simulation process can be categorized in a data-dependent and data-free manner. Compared with the data-dependent method, the data-free one has been proven to be more practical in the real world since it trains the substitute model with synthesized data. However, the distribution of these fake data lacks diversity and cannot detect the decision boundary of the target model well, resulting in the dissatisfactory simulation effect. Besides, these data-free techniques need a vast number of queries to train the substitute model, increasing the time and computing consumption and the risk of exposure. To solve the aforementioned problems, in this paper, we propose a novel data-free model extraction method named SCME (Self-Contrastive Model Extraction), which considers both the inter- and intra-class diversity in synthesizing fake data. In addition, SCME introduces the Mixup operation to augment the fake data, which can explore the target model’s decision boundary effectively and improve the simulating capacity. Extensive experiments show that the proposed method can yield diversified fake data. Moreover, our method has shown superiority in many different attack settings under the query-limited scenario, especially for untargeted attacks, the SCME outperforms SOTA methods by 11.43% on average for five baseline datasets.

Renyang Liu, Jinhong Zhang, Kwok-Yan Lam, Jun Zhao, Wei Zhou
CSEC: A Chinese Semantic Error Correction Dataset for Written Correction

Existing research primarily focuses on spelling and grammatical errors in English, such as missing or wrongly adding characters. This kind of shallow error has been well-studied. Instead, there are many unsolved deep-level errors in real applications, especially in Chinese, among which semantic errors are one of them. Semantic errors are mainly caused by an inaccurate understanding of the meanings and usage of words. Few studies have investigated these errors. We thus focus on semantic error correction and propose a new dataset, called CSEC, which includes 17,116 sentences and six types of errors. Semantic errors are often found according to the dependency relations of sentences. We thus propose a novel method called Desket (Dependency Syntax Knowledge Enhanced Transformer). Desket solves the CSEC task by (1) capturing the syntax of the sentence, including dependency relations and part-of-speech tagging, and (2) using dependency to guide the generation of the correct output. Experiments on the CSEC dataset demonstrate the superior performance of our model against existing methods.

Wenxin Huang, Xiao Dong, Meng-xiang Wang, Guangya Liu, Jianxing Yu, Huaijie Zhu, Jian Yin
Contrastive Kernel Subspace Clustering

As a class of nonlinear subspace clustering methods, kernel subspace clustering has shown promising performance in many applications. This paper focuses on the kernel selection problem in the kernel subspace clustering model. Currently, the kernel function is typically chosen by the single kernel or multiple kernel methods. The former relies on a given kernel function, which poses challenges in clustering tasks with limited prior information, making it difficult to determine a suitable kernel function beforehand. Multiple kernel methods usually assume that the optimal kernel is near a series of predefined base kernels, which limits the expressive ability of the optimal kernel. Furthermore, multiple kernel methods tend to have higher solution complexity than single kernel methods. To address these limitations, this paper utilizes contrastive learning to learn the optimal kernel adaptively and proposes the Contrastive Kernel Subspace Clustering (CKSC) method. Unlike multiple kernel approaches, CKSC is not constrained by the multiple kernel assumption. Specifically, CKSC integrates a contrastive regularization into the kernel subspace clustering model, encouraging neighboring samples in the original space to stay nearby in the reproducing kernel Hilbert space (RKHS). In this way, the resulting kernel mapping can preserve the cluster structure of the data, which will benefit downstream clustering tasks. The clustering experiments on seven benchmark data sets validate the effectiveness of the proposed CKSC method.

Qian Zhang, Zhao Kang, Zenglin Xu, Hongguang Fu
UATR: An Uncertainty Aware Two-Stage Refinement Model for Targeted Sentiment Analysis

Target sentiment analysis aims to predict the fine-grained sentiment polarity of a given term. Although some achievements have been made in recent years, the accuracy of targeted sentiment multi-classification technology is still insufficient-a considerable proportion of samples are incorrectly predicted as the opposite polarity. To this end, we investigate the effectiveness of utilizing model uncertainty and propose a two-stage refinement predicting model based on uncertainty called UATR. UATR can model uncertainty by inferring the distribution of model weights and is more robust to small data learning. Experiments on standard benchmark SemEval14 show that our model can not only reduce the proportion of samples incorrectly predicted as the opposite polarity, but also improves accuracy and F1 values by more than 2% and 3% compared to the current state-of-the-art models, respectively.

Xiaoting Guo, Qingsong Yin, Wei Yu, Qingbing Ji, Wei Xiao, Tao Chang, Xiaodong Wang
AttIN: Paying More Attention to Neighborhood Information for Entity Typing in Knowledge Graphs

Entity types in knowledge graph (KG) have been employed extensively in downstream tasks of natural language processing (NLP). Currently, knowledge graph entity typing is usually inferred by embeddings, but a single embedding approach ignores interactions between neighbor entities and relations. In this paper, we propose an AttIN model that pays more attention to entity neighborhood information. More specifically, AttIN contains three independent inference modules, including a BERT module that uses the target entity neighbor to infer the entity type individually, a context transformer that aggregates information based on different contributions from the neighbor, and an interaction information aggregator (IIAgg) module that aggregates the entity neighborhood information into a long sequence. In addition, we use exponentially weighted pooling to process these predictions. Experiments on the FB15kET and YAGO43kET datasets show that AttIN outperforms existing competitive baselines while it does not need extra semantic information in the sparse knowledge graph.

Yingtao Wu, Weiwen Zhang, Hongbin Zhang, Huanlei Chen, Lianglun Cheng
Text-Based Person re-ID by Saliency Mask and Dynamic Label Smoothing

The current text-based person re-identification (re-ID) models tend to learn salient features of image and text, which however is prone to failure in identifying persons with very similar dress, because their image contents with observable but indescribable difference may have identical textual description. To address this problem, we propose a saliency mask based re-ID model to learn non-salient but highly discriminative features, which can work together with the salient features to provide more robust pedestrian identification. To further improve the performance of the model, a dynamic label smoothing based cross-modal projection matching loss (named CMPM-DS) is proposed to train our model, and our CMPM-DS can adaptively adjust the smoothing degree of the true distribution. We conduct extensive ablation and comparison experiments on two popular re-ID benchmarks to demonstrate the efficiency of our model and loss function, and improving the existing best R@1 by 0.33% on CUHK-PEDE and 4.45% on RSTPReID.

Yonghua Pang, Canlong Zhang, Zhixin Li, Liaojie Hu
Robust Multi-view Spectral Clustering with Auto-encoder for Preserving Information

Multi-view clustering is a prominent research topic in machine learning that leverages consistent and complementary information from multiple views to improve clustering performance. The graph-based multi-view clustering methods learn consistent graph with pairwise similarity between data as edges, and generate sample representation using spectral clustering. However, most existing methods seldom consider to recreate the input data using encoded representation during representation learning procedure, which result in information loss. To address this limitation, we propose a robust multi-view clustering with auto-encoder for preserving information (RMVSC-AE) that minimizes the reconstruction error between the input data and the reconstructed representation to preserve knowledge. Specifically, we discover a graph representation by jointly optimizing the graph Laplacian and auto-encoder reconstruction terms. Moreover, we introduce a sparse noisy term to further enhance the quality of the learned consistent graph. Extensive experiments on six multi-view datasets are conducted to verify the efficacy of the proposed method.

Xiaojie Wang, Ye Liu, Hongshan Pu, Yuchen Mou, Chaoxiong Lin
Learnable Color Image Zero-Watermarking Based on Feature Comparison

Zero-watermarking is one of the solutions to protect the copyright of color images without tampering with them. Existing zero-watermarking algorithms either rely on static classical techniques or employ pre-trained models of deep learning, which limit the adaptability of zero-watermarking to complex and dynamic environments. These algorithms are prone to fail when encountering novel or complex noise. To address this issue, we propose a self-supervised anti-noise learning color image zero-watermarking method that leverages feature matching to achieve lossless protection of images. In our method, we use a learnable feature extractor and a baseline feature extractor to compare the features extracted by both. Moreover, we introduce a combined weighted noise layer to enhance the robustness against combined noise attacks. Extensive experiments show that our method outperforms other methods in terms of effectiveness and efficiency.

Baowei Wang, Changyu Dai, Yufeng Wu
P-IoU: Accurate Motion Prediction Based Data Association for Multi-object Tracking

Multi-object tracking in complex scenarios remains a challenging task due to objects’ irregular motions and indistinguishable appearances. Traditional methods often approximate the motion direction of objects solely based on their bounding box information, leading to cumulative noise and incorrect association. Furthermore, the lack of depth information in these methods can result in failed discrimination between foreground and background objects due to the perspective projection of the camera. To address these limitations, we propose a Pose Intersection over Union (P-IoU) method to predict the true motion direction of objects by incorporating body pose information, specifically the motion of the human torso. Based on P-IoU, we propose PoseTracker, a novel approach that combines bounding box IoU and P-IoU effectively during association to improve tracking performance. Exploiting the relative stability of the human torso and the confidence of keypoints, our method effectively captures the genuine motion cues, reducing identity switches caused by irregular movements. Experiments on the DanceTrack and MOT17 datasets demonstrate that the proposed PoseTracker outperforms existing methods. Our method highlights the importance of accurate motion prediction of objects for data association in MOT and provides a new perspective for addressing the challenges posed by irregular object motion.

Xinya Wu, Jinhua Xu
WCA-VFnet: A Dedicated Complex Forest Smoke Fire Detector

Forest fires pose a significant threat to ecosystems, causing extensive damage. The use of low-resolution forest fire imagery introduces high complexity due to its multi-scene, multi-environment, multi-temporal, and multi-angle nature. This approach aims to enhance the model’s generalizability across diverse and intricate fire detection scenarios. While state-of-the-art detection algorithms like YoloX, Deformable DETR, and VarifocalNet have demonstrated remarkable performance in the field of object detection, their effectiveness in detecting forest smoke fires, especially in complex scenarios with small smoke and flame targets, remains limited. To address this issue, we propose WCA-VFnet, an innovative approach that incorporates the Weld C-A component-a method featuring shared convolution and fusion attention. Furthermore, we have curated a distinctive dataset called T-SMOKE, specifically tailored for detecting small-scale, low-resolution forest smoke fires. Our experimental results show that WCA-VFnet achieves a significant improvement of approximately 35% in average precision (AP) for detecting small flame targets compared to Deformable DETR.

Xingran Guo, Haizheng Yu, Xueying Liao
Label Selection Algorithm Based on Ant Colony Optimization and Reinforcement Learning for Multi-label Classification

Multi-label classification handles scenarios where an instance can be annotated with multiple non-exclusive but semantically related labels simultaneously. Despite significant progress, multi-label classification is still challenging due to the emergence of multiple applications leading to high-dimensional label spaces. Researchers have generalized feature dimensionality reduction techniques to label space by using label correlation information, and obtained two techniques: label embedding and label selection. There have been many successful algorithms in label embedding, but less attention has been paid to label selection. In this paper, we propose a label selection algorithm for multi-label classification: LS-AntRL, which combines ant colony optimization (ACO) and reinforcement learning (RL). This method helps ant colony algorithms search better in the search space by using temporal difference (TD) RL algorithm to learn directly from the experience of ants. For heuristic learning, we need to model the ACO problem as a RL problem, that is, to model label selection as a Markov decision process (MDP), where the label represents the state, and each ant selecting unvisited labels represents a set of actions. The state transition rules of the ACO algorithm constitute the transition function in the MDP, and the state value function is updated by TD formula to form a heuristic function in ACO. After performing label selection, we train a binary weighted neural network to recover low-dimensional label space back to the original label space. We apply the above model to five benchmark datasets with more than 100 labels. Experimental results show that our method achieves better classification performance than other advanced methods in terms of two performance evaluation metrics (Precision@n and DCG@n).

Yuchen Pan, Yulin Xue, Jun Li, Jianhua Xu
Reversible Data Hiding Based on Adaptive Embedding with Local Complexity

In recent years, most reversible data hiding (RDH) algorithms have considered the impact of texture information on embedding performance. The distortion caused by embedding secret data in the image’s smooth region is much less than in the non-smooth region. It is because embedding secret data in the smooth region corresponds to fewer invalid shifting pixels (ISPs) in histogram shifting. However, though effective, the local complexity is not calculated precisely enough, which results in inaccurate texture division and does not considerably reduce distortion. Therefore, a new RDH scheme based on adaptive embedding with local complexity (AELC) is proposed to improve the embedding performance effectively. Specifically, the cover image is divided into two subsets by the checkerboard pattern. Then the local complexity of each pixel is computed by the correlation between adjacent pixels (CBAP). Finally, secret data are adaptive and preferentially embedded into the regions with lower local complexity in each subset. Experimental results show that the proposed algorithm performs best regarding invalid shifted pixels, maximum embedding capacity (EC), and peak signal-to-noise ratio (PSNR) compared to some state-of-the-art RDH methods.

Chao Wang, Yicheng Zou, Yaling Zhang, Ju Zhang, Jichuan Chen, Bin Yang, Yu Zhang
Generalized Category Discovery with Clustering Assignment Consistency

Generalized category discovery (GCD) is an important open-world task to automatically cluster the unlabeled samples using information transferred from the labeled dataset, given a dataset consisting of labeled and unlabeled instances. A major challenge in GCD is that unlabeled novel class samples and unlabeled known class samples are mixed together in the unlabeled dataset. To conduct GCD without knowing the number of classes in unlabeled dataset, we propose a co-training-based framework that encourages clustering consistency. Specifically, we first introduce weak and strong augmentation transformations to generate two sufficiently different views for the same sample. Then, based on the co-training assumption, we propose a consistency representation learning strategy, which encourages consistency between feature-prototype similarity and clustering assignment. Finally, we use the discriminative embeddings learned from the semi-supervised representation learning process to construct an original sparse network and use a community detection method to obtain the clustering results and the number of categories simultaneously. Extensive experiments show that our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.

Xiangli Yang, Xinglin Pan, Irwin King, Zenglin Xu
CInvISP: Conditional Invertible Image Signal Processing Pipeline

Standard RGB (sRGB) images processed by the image signal processing (ISP) pipeline of digital cameras have a nonlinear relationship with the scene irradiance. Therefore, the low-level vision tasks which work best in a linear color space are not suitable to be carried out in the sRGB color space. To address this issue, this paper proposes an approach called CInvISP to provide a bidirectional mapping between the nonlinear sRGB and linear CIE XYZ color spaces. To ensure a fully invertible ISP, the basic building blocks in our framework adopt the structure of invertible neural network. As camera-style information is embedded in sRGB images, it is necessary to completely remove it during backward mapping, and properly incorporate it during forward mapping. To this end, a conditional vector is extracted from the sRGB input and inserted into each invertible building block. Experiments show that compared to other mapping approaches, CInvISP achieves a more accurate bidirectional mapping between the two color spaces. Moreover, it is also verified that such a precise bidirectional mapping facilitates low-level vision tasks including image denoising and retouching well.

Duanling Guo, Kan Chang, Yahui Tang, Mingyang Ling, Minghong Li
Ignored Details in Eyes: Exposing GAN-Generated Faces by Sclera

Advances in Generative adversarial networks (GAN) have significantly improved the quality of synthetic facial images, posing threats to many vital areas. Thus, identifying whether a presented facial image is synthesized is of forensic importance. Our fundamental discovery is the lack of capillaries in the sclera of the GAN-generated faces, which is caused by the lack of physical/physiological constraints in the GAN model. Because there are more or fewer capillaries in people’s eyes, one can distinguish real faces from GAN-generated ones by carefully examining the sclera area. Following this idea, we first extract the sclera area from a probe image, then feed it into a residual attention network to distinguish GAN-generated faces from real ones. The proposed method is validated on the Flickr-Faces-HQ and StyleGAN2/StyleGAN3-generated face datasets. Experiments demonstrate that the capillary in the sclera is a very effective feature for identifying GAN-generated faces. Our code is available at: https://github.com/10961020/Deepfake-detector-based-on-blood-vessels .

Tong Zhang, Anjie Peng, Hui Zeng
A Developer Recommendation Method Based on Disentangled Graph Convolutional Network

Crowdsourcing Software Development (CSD) solves software development tasks by integrating resources from global developers. With more and more companies and developers moving onto CSD platforms, the information overload problem of the platform makes it difficult to recommend suitable developers for the software development task. The interaction behavior between developers and tasks is often the result of complex latent factors. Existing developer recommendation methods are mostly based on deep learning, where the feature representations ignores the influence of latent factors on interactive behavior, leading to learned feature representations that lack robustness and interpretability. To solve the above problems, we present a Developer Recommendation Method Based on Disentangled Graph Convolutional (DRDGC). Specifically, we use a disentangled graph convolutional network to separate the latent factors within the original features. Each latent factor contains specific information and is independent from each other, which makes the features constructed by the latent factors exhibit stronger robustness and interpretability. Extensive experiments results show that DRDGC can effectively recommend the right developer for the task and outperforms the baseline methods.

Yan Lu, Junwei Du, Lijun Sun, Jinhuan Liu, Lei Guo, Xu Yu, Daobo Sun, Haohao Yu
Novel Method for Radar Echo Target Detection

Radar target detection, as one of the pivotal techniques in radar systems, aims to extract valuable information such as target distance and velocity from the received energy echo signals. However, with the advancements in aviation and electronic information technology, there have been profound transformations in the radar detection targets, scenarios, and environments. The majority of conventional radar target detection methods are primarily based on Constant False Alarm Rate (CFAR) techniques, which rely on certain distribution assumptions. However, when the detection scenarios become intricate or dynamic, the performance of these detectors is significantly influenced. Therefore, ensuring the robust performance of radar target detection models in complex task scenarios has emerged as a crucial concern. In this paper, we propose a radar target detection method based on a hybrid architecture of convolutional neural networks and autoencoder networks. This approach comprises clutter suppression and target detection modules. We conducted ablation experiments and comparative experiments using publicly available radar echo datasets and simulated radar echo datasets. The ablation experiments validated the effectiveness of the clutter suppression module, while the comparative experiments demonstrated the superior performance of our proposed method compared to the alternative approaches in complex background scenarios.

Zhiwei Chen, Dechang Pi, Junlong Wang, Mingtian Ping
Backmatter
Metadata
Title
Neural Information Processing
Editors
Biao Luo
Long Cheng
Zheng-Guang Wu
Hongyi Li
Chaojie Li
Copyright Year
2024
Publisher
Springer Nature Singapore
Electronic ISBN
978-981-9980-73-4
Print ISBN
978-981-9980-72-7
DOI
https://doi.org/10.1007/978-981-99-8073-4

Premium Partner