Neural Information Processing
32nd International Conference, ICONIP 2025, Okinawa, Japan, November 20–24, 2025, Proceedings, Part IV
- 2026
- Book
- Editors
- Tadahiro Taniguchi
- Chi Sing Andrew Leung
- Tadashi Kozuno
- Junichiro Yoshimoto
- Mufti Mahmud
- Maryam Doborjeh
- Kenji Doya
- Publisher
- Springer Nature Singapore
About this book
This 6-volume set constitutes the proceedings of the 32nd International Conference on Neural Information Processing, ICONIP 2025, held in Okinawa, Japan, in November 2025.
The 197 full papers presented in this book were carefully selected and reviewed from 1092 submissions.
Table of Contents
-
Computer Vision and Image Processing
-
Frontmatter
-
A Novel Plug-and-Play Method for LiDAR and 4D Radar Fusion
Pengfei Qi, Jinlai Zhang, Yan Su, Xiang Zou, Yi Huang, Yonggang Tong, Kaiming Wang, Qiqi Li, Lin HuAbstractRecent advances in autonomous driving highlight the complementary strengths of LiDAR and 4D Radar, yet effective fusion of these modalities remains challenging due to discrepancies in resolution, noise, and sensing characteristics. In this paper, we propose a novel plug-and-play pipeline for LiDAR and 4D Radar fusion aimed at enhancing 3D object detection performance. The framework comprises three key modules: (1) a LiDAR–4D Radar Cross-Attention module that integrates a Cross-Attention Block and a LiDAR Denoise Layer to effectively exploit complementary features across modalities while suppressing LiDAR noise; (2) a LiDAR Denoise Layer that further refines LiDAR representations using radar-guided filtering; and (3) a Dual-Channel Attention Fusion mechanism that adaptively combines the enhanced LiDAR and 4D Radar features. Extensive experiments are conducted on six baselines, evaluating both 3D and bird’s-eye view (BEV) detection outputs. Our method consistently outperforms existing approaches, achieving up to a 13.48 mAP improvement over baselines. -
GaussianFocus: Constrained Attention Focus for 3D Gaussian Splatting
Zexu Huang, Min Xu, Stuart PerryAbstractRecent developments in 3D reconstruction and neural rendering have significantly propelled the capabilities of photo-realistic 3D scene rendering across various academic and industrial fields. The 3D Gaussian Splatting technique, alongside its derivatives, integrates the advantages of primitive-based and volumetric representations to deliver top-tier rendering quality and efficiency. Despite these advancements, the method tends to generate excessive redundant noisy Gaussians overfitted to every training view, which degrades the rendering quality. Additionally, while 3D Gaussian Splatting excels in small-scale and object-centric scenes, its application to larger scenes is hindered by constraints such as limited video memory, excessive optimization duration, and variable appearance across views. To address these challenges, we introduce GaussianFocus, an innovative approach that incorporates a patch attention algorithm to refine rendering quality and implements a Gaussian constraints strategy to minimize redundancy. Moreover, we propose a subdivision reconstruction strategy for large-scale scenes, dividing them into smaller, manageable blocks for individual training. Our results indicate that GaussianFocus significantly reduces unnecessary Gaussians and enhances rendering quality, surpassing existing State-of-The-Art (SoTA) methods. Furthermore, we demonstrate the capability of our approach to effectively manage and render large scenes, such as urban environments, whilst maintaining high fidelity in the visual output. Please visit https://github.com/HZXu-526/GaussianFocus for code. -
DDRASR: Double Dimension Retractable Attention Super-Resolution
Haoyu Tian, Jiangbo Xu, Gaolin Yang, Zijian XueAbstractImage super-resolution aims to reconstruct high-resolution images from their low-resolution counterparts, enhancing visual quality and recovering high-frequency details. Although deep learning-based methods have significantly advanced SR performance, existing approaches still face three main challenges: First, neglect or ineffective utilization of information across different dimensions. Second, ineffective aggregation of multi-scale information often leads to unnatural visual artifacts,especially around image edges and complex texture regions. Then, difficulty in balancing local detail preservation with global contextual understanding. In this paper, we propose a SR model that integrates both spatial and channel attention mechanisms to extract and aggregate features across dimensions more effectively. We introduce an Adaptive Interaction Module to fuse features within each block and further enhance the network’s representational capacity. Additionally, we introduce an alternating structure that leverages dense attention for fine-grained local details and sparse attention for long-range dependencies, enabling a broader receptive field. Experimental results demonstrate that our model achieves superior reconstruction quality with improved perceptual realism, validating the effectiveness of the proposed dual-attention aggregation strategy. -
DTMPose: Depth Transform-Enhanced Mamba Pose Estimation Framework for Efficient 2D Keypoint Detection
Guanting Dong, Kei KawamuraAbstract2D keypoint detection plays an important role in the fields of group behavior analysis, motion capture, human-computer interaction, and security monitoring. However, in high-density crowd environments or edge devices with limited computational resources, it is still a major challenge to improve inference efficiency while ensuring detection accuracy. To this end, this paper proposes a keypoint detection framework called ‘DTMPose’, whose core innovation is to replace the computationally intensive attention module with a Mamba-based state-space model (SS2D mechanism) and to introduce a sense-field-enhanced convolution (e.g., ‘DPConv’) in the key parts, to improve the detection of local occlusion and edge details. Compared to models which only rely on the self-attention mechanism, DTMPose reduces the computational overheads, whilst still capturing global dependencies, and effectively mitigates local keypoint ambiguities through enhanced convolution. Experimental results on the COCO dataset show that DTMPose maintains a low parameter count with an accuracy of about 76% AP, demonstrating its deployment potential in high-density crowd scenarios and mobile edge devices, as well as providing a new feasible solution for applications such as people flow monitoring and group behavior analysis. -
GGMNet: Gradient Imagery-Guided Multi-scale Feature Fusion for Remote Sensing Change Detection
Eksan Payzullam, Baokun Su, Guoxia Wang, Changle Yin, Gang ShiAbstractIn addressing the challenges posed by edge information loss and inadequate multi-scale feature integration in remote sensing image change detection, this paper proposes a novel approach: the gradient image-guided multi-scale feature fusion network (GGMNet). The primary innovations comprise a gradient-guided module (GRAD) and a multi-scale depthwise fusion module (MSDF). The GRAD module is responsible for generating gradient maps through the gradient variation magnitude of bi-temporal images (captured at two different time points for the same location), enhancing edge features using linear self-attention, and suppressing pseudo-change noise through spatial-channel attention. The MSDF integrates features at various levels to explore hierarchical semantic information, ranging from specifics to global context. To assess the efficacy of change detection, We use four widely adopted metrics: precision, recall, the F1 score, and intersection over union (IoU). Extensive experimentation on the LEVIR-CD and SYSU-CD datasets has been conducted to demonstrate the efficacy of the proposed method. -
Automated Knot Detection and Pairing for Wood Analysis in the Timber Industry
Guohao Lin, Shidong Pan, Rasul Khanbayov, Changxi Yang, Ani Khaloian-Sarnaghi, Andriy KovrygaAbstractKnots in wood are critical to both aesthetics and structural integrity, making their detection and pairing essential in timber processing. However, traditional manual annotation was labor-intensive and inefficient, necessitating automation. This paper proposes a lightweight and fully automated two-stage pipeline for knot detection and pairing. In the detection stage, high-resolution surface images of wooden boards are collected using industrial-grade cameras, and a large-scale dataset is manually annotated and preprocessed. After the transfer learning, the YOLOv8l achieves an mAP@0.5 of 0.887. In the pairing stage, we define and extract a set of multidimensional features from detected knots. A triplet neural network is used to map the features into a latent space, enabling clustering algorithms to identify and pair corresponding knots. The triplet network with learnable weights achieve a pairing accuracy of 0.85. Further analysis reveals that the distances from the knot’s start and end points to the bottom of the wooden board, and the longitudinal coordinates play crucial roles in achieving high pairing accuracy. Our experiments validate the effectiveness of the proposed solution, demonstrating the potential of AI in advancing wood science and industry. -
Enhancing Hyperspectral Remote Salient Object Detection via Spectral Recalibration, Multi-scale Decoding, and Global Context Modeling
Pengxuan Liu, Wuzhen Shi, Yang WenAbstractHyperspectral remote sensing images (HSIs) possess rich spectral resolution and spatial structural information, offering significant advantages in object detection tasks. Although existing approaches such as DSSN have made progress in this field, there remain substantial challenges, such as limited suppression of spectral redundancy, inadequate multi-scale feature fusion, insufficient global context modeling, and loss functions that are not fully adapted to the characteristics of HSIs. In this paper, we propose an enhanced deep network architecture that systematically incorporates four key innovations: a Spectral Squeeze-and-Excitation module (SpectralSE), a Feature Pyramid Decoder (FPNDecoder), a deep Spectral-Spatial Transformer module (SpecSpaTransformer), and a joint BCE + Dice loss function. The proposed method is thoroughly evaluated on HRSSD, the only publicly available benchmark dataset specifically designed for hyperspectral salient object detection. Experimental results demonstrate that our model consistently outperforms existing approaches across multiple mainstream metrics, achieving superior detection accuracy, structural consistency, and model robustness, thereby establishing a new state-of-the-art (SOTA) in the field. -
FSMHNet: A Frequency-Aware and Stroke-Enhanced Multiscale Hash Network for Oracle Image Retrieval
Yanni Zuo, Zhongyuan Yang, Yongge Liu, Kurban UbulAbstractAs the earliest mature Chinese character system, oracle bone inscriptions (OBI) contain valuable historical and cultural information, making their image retrieval techniques crucial for text interpretation and cultural heritage digitization. However, structural variations caused by inscriptions, burial conditions, and topography, along with high frequency noise and edge blurring in the images, pose significant challenges to accurate retrieval. To address these challenges, this paper proposes a frequency-aware and stroke-enhanced multiscale hash network (FSMHNet). The method introduces fast Fourier transform (FFT) in the feature extraction stage to map features into the frequency domain, then filters and adjusts different frequency components through a weighted gating mechanism. Building on this foundation, we combine discrete wavelet transform (DWT) to achieve multiscale decomposition and design an asymmetric pyramid convolution module (APC) to capture feature differences between horizontal and vertical strokes of oracle bone characters, thereby improving feature orientation sensitivity. Additionally, we design a stroke feature enhancement (SFE) module to further strengthen the semantic expression of local key strokes, while incorporating the multi-head self-attention mechanism (MHSA) to model global associations and enhance overall semantic representation. Finally, the system outputs efficient retrieval representations through hash coding. Experimental results demonstrate the method’s effectiveness and generalization capability in both same-domain and cross-domain retrieval tasks constructed on the Oracle-MNIST and Oracle-241 datasets. -
MSVDNet: A Multi-scale Video Desnowing Network for Real World
Mengzhen Xue, Gang Zhou, Linghui Ma, Li Zhang, Yanyun Zhou, Zhenhong JiaAbstractVideo desnowing is a critical research topic in the field of computer vision, as snowfall can obscure video content and degrade visual quality. However, due to the diverse scales, shapes, and complex motion patterns of snowflakes, existing video desnowing methods still face challenges when dealing with real-world snowy scenarios. To effectively remove snowflakes and restore high-quality videos under real snowy conditions, we propose a Multi-Scale Video Desnowing Network (MSVDNet), which combines the strengths of Transformer and UNet architectures. Specifically, we design a Multi-Scale Feature Fusion Module (MSFFM) to enhance multi-scale feature perception by integrating encoder features from multiple hierarchical levels. In addition, we propose a Multi-Scale Adaptive Hybrid Attention Module (MSAHAM), which adaptively focuses on local details and global contextual information by combining multi-scale spatial attention and channel attention to recover backgrounds occluded by snowflakes of various scales. Finally, we introduce contrastive learning to narrow the domain gap between desnowed outputs and real-world clean scenes, thereby enhancing the model’s generalization ability. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on the RVSD benchmark dataset and exhibits superior capability in restoring videos in real-world snowy scenarios. -
MAPT: Memory-Augmented Prompt Tuning at Test-Time for CLIP
Jiaming Yi, Ruirui Pan, Jishen Yang, Xiulong YangAbstractImproving the generalization ability of Vision-Language Pre-trained Models (VLMs) under test-time data distribution shifts remains a critical challenge. The existing Test-Time Adaptation (TTA) methods fall short in fully leveraging the model’s internal knowledge, particularly in dynamically adapting to complex and hierarchical visual semantic information. In this paper, we propose Memory-Augmented Prompt Tuning (MAPT), a novel framework to address this issue. Inspired by human associative memory theory, MAPT introduces a Memory Prompt Bank (MPB), which stores learnable key-value prompt pairs that work as a memory of previously seen samples. During the test time, relevant prompt pairs in the MPB are retrieved by the hierarchical visual features of test images to dynamically assemble Associative Prompts. The associative prompts are then injected into the image encoder for fine-grained, customized visual contextual guidance. MAPT also utilizes learnable text prompts. MAPT thus enables rapid, precise VLM adaptation at test time by leveraging this MPB-acquired memory, without source data or retraining. The code is available at https://github.com/Jamieyi2004/MAPT. -
WT-RT-DETR: A Wavelet Detector for Real-Time Wood Defect Detection
Yuefeng Zhao, Junjie Wang, Jinwei Zhang, Qifei Wang, Xianqi Meng, Guicong Zhang, Yuxin Song, Nannan HuAbstractWood defect detection plays a critical role in ensuring wood quality and minimizing resource waste in industrial production. Advanced object detectors show excellent performance in wood defect detection. However, the texture of wood defects is complex and categories are diverse, and existing methods still face challenges in the detection of defects with small size and unclear boundaries. To address the issue, we propose a novel wavelet detector based on DETR for real-time wood defect detection, which is called WT-RT-DETR. Firstly, we optimize the network structure by incorporating Partial Self Attention (PSA) as the object detection head, which enhances the multi-scale shallow and deep features. Secondly, we propose an Inverse Haar Wavelet Transform Upsampling module (UP-IHWT), which preserves details and multi-scale information based on the wavelet inverse transform. Finally, we propose the Reparameterized Wavelet Transform Convolution 3 module (RepWTC3) by employing wavelet convolution into RepC3, which strengthens the boundary features extraction capabilities of the neck network. The experimental results show that the mAP50 of WT-RT-DETR is 69.1%, which is higher than the state-of-the-art methods, and the model parameters are slightly reduced. It can better balance efficiency and accuracy. -
Enhancing 4D Consistency for Mamba-Based Light Field Spatial Super-Resolution
Yu Wang, Ruixuan Cong, Zexin Sun, Da Yang, Zhenglong Cui, Siyang Li, Shuai Wang, Hao ShengAbstractLight field spatial super-resolution (LFSSR) relies on the comprehensive complementarity of 4D information. Existing CNN-based methods suffer from limited receptive field, while Transformer-based approaches incur prohibitive quadratic computational complexity. Although recent Mamba-based methods can model global dependency with linear complexity, the sequential scanning mechanism of Mamba inherently disrupts the intrinsic 4D local dependency of the light field image. Moreover, most of them encode the subspaces of light field in a decoupled manner, lacking explicit holistic modeling. In this paper, we propose LF Omnidirectional Mamba (LFOmniMamba) to address these challenges through two key components. First, the Domain-Enhanced Mamba (DEMamba) preserves and enhances local 4D consistency when processing light field subspaces via its internally integrated LF Neighbor-Guided Block. Second, the Pseudo-4D Mamba (P4DMamba) achieves efficient holistic global feature refinement through multi-scale 4D Mamba modeling. Building upon LFOmniMamba, we further propose LFOmniSR for comprehensive 4D feature extraction in LFSSR. Extensive benchmark experiments demonstrate that LFOmniSR outperforms state-of-the-art methods. -
Leveraging Spatiotemporal Semantic Features for Skeleton-Based Action Recognition
Hailun Xia, Naichuan Zheng, Yuqi YangAbstractSkeleton-based action recognition is essential in video analysis. While Graph Convolutional Networks (GCN)-based methods effectively represent spatio-temporal features, they often struggle with fine-grained actions, leading to a decline in classification performance. To address this issue, we propose a novel method inspired by spatio-temporal semantic learning, which explicitly enhances feature discrimination by emphasizing differences between actions through spatio-temporal semantics. Our method introduces a plug-and-play spatio-temporal semantic extraction module (STSM), which encompasses the separate extraction of spatio-temporal features and the clustering of similar semantics for different actions. By adjusting the distance between semantic categories, the feature discrimination of the model is enhanced. Furthermore, the STSM is integrated into various stages of GCNs to extract multi-level spatio-temporal semantic features, and a spatio-temporal semantic loss is constructed for more effective supervision. Extensive experiments on three datasets demonstrate that our model leads to superior classification accuracy and outperforms advanced methods. -
Augmentation of LiDAR Scenes with Adverse Weather Conditions Using Latent Diffusion Models
Andrea Matteazzi, Michael Arnold, Dietmar TutschAbstractLiDAR scenes constitute a fundamental source of training data for numerous autonomous driving applications. However, diverse scenes with adverse weather conditions are rarely available, limiting the robustness of downstream machine learning models and the reliability of autonomous driving systems in such conditions. Collecting diverse scenes under adverse weather is challenging due to seasonal limitations. Generative models represent the current state-of-the-art in data generation, and we therefore consider them for augmenting specific driving scenarios with adverse weather conditions. In this paper, we propose a latent diffusion process combining an autoencoder and a latent diffusion model, to simulate adverse weather conditions by applying a diffusion and denoising process to clear weather scenes. We further improve realism by applying a postprocessing step to enhance the generated adverse weather scenes. We create a 3D object detection benchmark for adverse weather conditions based on a publicly available dataset and compare our augmentation method against several baselines. Our best model, trained on both the original data and our augmentation with adverse weather conditions, achieves 8.8 mAP improvement over the baseline model trained without augmentation. Code: https://github.com/matteandre/AWC-LDM. -
DLRF-Net: A Decomposition-Driven Network for Low-Rank Infrared and Visible Image Fusion
Wei Gao, Youning Wei, Yu Zhang, Peng Yang, Wei Mao, Yi ShiAbstractInfrared-visible image fusion plays a vital role in multi-modal vision tasks such as surveillance, navigation, and target recognition. While deep learning has advanced the field, most fusion networks are still empirically designed, lacking theoretical interpretability. To address this, we propose DLRF-Net, an end-to-end framework built upon a mathematically formulated low-rank decomposition model. By reformulating fusion as a constrained optimization problem, DLRF-Net embeds its solution into a structured convolutional network, effectively disentangling low-rank and sparse components for interpretable feature representation. A hierarchical loss further enforces multi-level constraints to preserve visible details and highlight infrared saliency. Experiments on benchmark datasets show that DLRF-Net outperforms state-of-the-art methods in both visual quality and quantitative metrics, while maintaining a compact architecture—validating the effectiveness of representation-guided network design. -
Teacher-Student Consistent Distillation for Source-Free Domain Adaptation Object Detection
Yangfan Wang, Hongyang Yu, Xiying LiAbstractSource-Free Domain Adaptation Object Detection (SFOD) aims to adapt a source-pretrained detector to the target domain, using only unlabeled target domain data and without any data from the source domain. Most existing methods follow the Mean-Teacher self-training paradigm. However, the inherent domain shift between the source pretrained model and the target domain data results in noisy and false pseudo-labels, limiting detection performance. To address this problem, we propose an improved Mean-Teacher method, Teacher-Student Consistency Distillation (TSCD), which introduces a feature distillation regularization term to enhance the consistency of the Mean-Teacher framework. Specifically, we first introduce a feature fusion-alignment mechanism. The feature fusion network cascading multiple attention fusion modules aggregates domain-invariant knowledge for the student network. Aligning fused features can implicitly provide the cross-level consistency. Then, we design a novel feature distillation loss, mining the easily ignored regions caused by domain shift. Finally, we introduce a weighting strategy for the distillation loss, which dynamically allocates weights to each sample pair. Extensive experiments on multiple SFOD benchmarks show that our proposed method achieves competitive performance compared to related methods, demonstrating the effectiveness of our Teacher-Student Consistency Distillation method. -
Optimized Dynamic Snake Convolution Module for Accurate Crack Segmentation
Dianwen Li, Jianming Zhang, Gan Cheng, Fangli Duan, Yan GuiAbstractCrack segmentation plays a crucial role in assessing pavement technical conditions. The irregular tubular shape of cracks makes segmentation a challenging task. Dynamic snake convolution (DSConv) enhances feature extraction of tubular structures through deformation, but it comes with significant computational overhead. Through an analysis of the offset of the convolutional kernel in DSConv, we observe that the offsets tend to cluster around their mean values. To achieve high segmentation accuracy without introducing significant computational overhead, an optimized dynamic snake convolution module (ODSCM) is proposed for constructing a crack segmentation network. Instead of applying DSConv to all feature channels, the proposed ODSCM replaces DSConv with horizontal convolution (HC) and vertical convolution (VC) only on some channels. To compensate for the decrease in adaptive capability caused by this strategy, an offsets-guided feature optimization module (OFOM) is proposed, which optimizes the features extracted by HCs and VCs through the guidance of the offset of the convolutional kernel in DSConv. Comprehensive experiments are conducted on three datasets: Deepcrack, CrackLS315, and CFD. The results demonstrate that our method surpasses several state-of-the-art methods. The data and source code will be made public at https://github.com/name191/DSCCSNet. -
LABA-Cyclegan: LAB-Enhanced Cyclegan with Spatial-Channel Attentive for Underwater Image Enhancement
Yongxing Hong, Haisen Li, Jing Wang, Zhanfei PengAbstractUnderwater image enhancement technology plays a critical role in marine exploration and robotic operations. However, due to the absorption and scattering of light in water, underwater images commonly suffer from color distortion, low contrast, and loss of detail. This paper proposes an adaptive enhancement network based on lab color space enhancement and attention mechanisms, which decouples and separately processes luminance and chrominance information in the lab color space, enhancing them using different approaches. Furthermore, we design an improved attention module that integrates a 3D convolution-based multi-head self-attention mechanism with depthwise separable convolutions, effectively capturing long-range dependencies while preserving local features. Experimental results demonstrate that our method performs excellently in complex underwater conditions, exhibits strong cross-scenario consistency, and produces natural and realistic images. The proposed solution significantly outperforms existing methods in enhancing image details and effectively addresses issues such as color distortion, uneven illumination, and detail loss, providing an innovative approach to underwater image enhancement. -
A Category-Guided Keypoint Detection Framework for Industrial Binary Images
Tianqi Ni, Yueche Chen, Xubin Wen, Siyu XiaAbstractIn industrial keypoint detection, image binarization is commonly used to extract industrial component contour regions, but this process often loses structure-related contextual information while resulting in extremely sparse foreground regions containing critical structures, severely degrading keypoint localization accuracy. To address these challenges, we propose a category-guided keypoint detection framework specifically designed for industrial binary images. The framework introduces a semantic classification embedding module integrated into a U-Net backbone, which explicitly embeds category information to effectively guide keypoint localization. Additionally, we design a balanced hybrid loss function based on regional stratification and normalized weighting, significantly enhancing the stability and precision of heatmap predictions under foreground-background imbalance. To validate our approach, we construct and publicly release an industrial dataset comprising three geometrically distinct insulator categories, providing annotations for both keypoint localization and classification. Experimental results demonstrate that our method achieves superior accuracy under sparse, imbalanced, and binarized conditions. Our framework provides a practical solution for precise and stable keypoint detection in structured industrial applications. Code and dataset are available at: https://github.com/TianqiNee/CategoryKeypointNet. -
Global Joint Local with Multi-directional Weighted Sparse Model for Infrared Small Target Detection
Junying Li, Xiaorong Hou, Yajian ZengAbstractInfrared small target detection plays a vital role in defense monitoring and aerospace early warning. However, due to the target’s small size and the low signal-to-noise ratio, it remains challenging in complex backgrounds. The low-rank sparse decomposition (LRSD) methods achieve detection by modeling structural differences between targets and backgrounds. Nevertheless, most existing methods only impose global low-rank and local smoothness constraints on the background from an additive perspective, ignoring their coupling relationship. This limits their ability to accurately characterize complex backgrounds. Moreover, the commonly used isotropic \(L_1\) norm fails to fully exploit the multi-directional gradient responses of targets. To address these issues, this paper proposes an improved LRSD model called Global Joint Local with Multi-Directional Weighted (GJL-MDW). It introduces a joint regularization to simultaneously capture global and local background structures, and a multi-directional weighted \(L_1\) norm to describe target sparsity, adaptively enhancing target response while suppressing background interference. Experiments on public datasets demonstrate that GJL-MDW outperforms state-of-the-art baseline methods in both detection accuracy and robustness. -
Hermite Curves and Multi-kernel Fusion for Large-Angle Curved and Rotated Text Spotting
Jiawhen Chen, Bo Wu, Tianyue Chen, Chunhua DengAbstractText spotting for large-angle curved and rotated coupled text in complex industrial scenarios faces persistent challenges. Traditional character-level annotation-based methods struggle to adapt to dynamic deformations, while existing regression-based end-to-end networks exhibit limited modeling capabilities for complex shapes, leading to significant regression errors in large-angle curved scenarios and severely compromised robustness in rotated text detection. To address the geometric modeling difficulties of large-angle curved text, in this paper, we propose a parameterized modeling method based on piecewise cubic Hermite interpolation curves. By leveraging derivative continuity constraints, it generates highly smooth text boundaries and significantly improves the fitting accuracy for curved text. Concurrently, we construct a dynamic feature alignment layer named HermiteAlign by integrating orthogonal sampling grids with bilinear interpolation, effectively alleviating feature distortion. To overcome the robustness bottleneck in detecting large-angle rotated text, we introduce a multi-kernel bounding box fusion mechanism. This approach dynamically selects optimal bounding box through semantic segmentation and four-directional sub-region transformation, combined with semantic completeness evaluation, ensuring rotation-invariant feature representation. Furthermore, to validate the effectiveness of the proposed method in industrial scenarios, we provide a dataset containing abundant instances of large-angle curved and rotated text. Extensive experiments demonstrate that our method achieves state-of-the-art performance, providing an efficient solution for large-angle deformed text spotting in complex industrial environments. -
Enhancing Interpretability for Fine-Grained Vessel Recognition
Wolodymyr Krywonos, Angelo CangelosiAbstractShip classification and maritime applications of computer vision have received a lot of attention yet, despite being a safety-critical field, models utilized remain opaque and therefore untrustworthy. Results on existing maritime datasets highlight a need for vessel-specific explainable techniques due to factors such as vessel variation. We adapt prototypical part-based explanation techniques to handle long-tailed feature distributions which often cause misalignment between human and machine explanations for fine-grained vessel recognition. We introduce CRP-PIPNet to provide more explainable, interactive models that human operators can potentially use for future tasks. We frame the prototype impurity problem as a form of poly-polysemanticity, utilizing gradient-based attribution in conjunction with clustering methods to create gradient cluster centres which act as true prototypes increasing prototype purity. By combining and splitting prototypes, a purer model can be obtained and with cleaner representations for downstream tasks. We validate this with a series of quantitative evaluations from industry-standard resources, Janes’ Fighting Warships, to highlight the potential of achieving greater trust through closer alignment between machine learning models and humans. -
STM-SalNet: A Biologically-Inspired Spatial-Temporal Memory Network for Video Saliency Prediction
Jikai Xu, Dandan Zhu, Kaiwei Zhang, Xiongkuo MinAbstractIn recent years, video saliency prediction has attracted significant attention across a wide range of vision-related tasks. However, most existing video saliency prediction methods predominantly rely on static encoder-decoder architectures, failing to incorporate the dynamic memory mechanisms that are fundamental to human visual perception and attention modeling. To address this limitation, we propose STM-SalNet, a novel biologically-inspired spatial-temporal memory network for video saliency prediction. First, inspired by the powerful visual processing capabilities of the human visual cortex, we introduce a brain-inspired Vision Transformer module designed to extract multi-level hierarchical spatial-temporal features. Subsequently, we propose a memory bank module equipped with an active forgetting mechanism, simulating human memory’s ability to selectively retain and update information. By dynamically retrieving relevant features from past frames while discarding redundancy, the module ensures robust adaptability to continuously evolving video content. To further enhance the integration of spatial and temporal features, we design a bidirectional spatial-temporal fusion module that facilitates effective interaction between deep semantic and shallow spatial features, enriching the overall feature representation. Finally, a progressively hierarchical decoder module is employed to generate fine-grained, pixel-wise saliency maps that closely align with ground truths. Extensive experiments on the DHF1K, Hollywood-2, and UCF-Sports benchmark datasets demonstrate that our proposed STM-SalNet achieves competitive performance compared to existing state-of-the-art methods. -
Pose-Guided Cross-Modal Knowledge Distillation for Visible-Infrared Person Re-identification
Sheng Zhang, TongJiaHao Teng, XiaoWei ZhangAbstractVisible-Infrared Person Re-identification (VI-ReID) aims to retrieve person images across visible and infrared modalities, a task challenged by significant cross-modal discrepancies and intra-class variations. Existing methods primarily focus on matching images based on global or latent features, yet often neglect the prior information of human poses and the intrinsic identity-related cues guided by pose structures. To address this issue, we propose a Pose-Guided Cross-Modal Knowledge Distillation (PG-CMKD) framework, which enhances feature representation through pose-guided semantic alignment at both modality-specific and shared feature levels, thereby improving re-identification performance. Specifically, in order to compensate for the information loss caused by modal differences in cross modal data, we first extract modality-specific and shared features of persons. Then, we introduce learnable prototypes to extract body joints and progressively learn cross modal semantic invariance features of pose keypoints under their guidance, ensuring robust representations. These features are further combined with original modality features through local-part knowledge distillation to reinforce semantic consistency. Finally, global knowledge distillation is employed to explore latent inter-modal relationships within both specific and shared features, realize cross modal semantic alignment, thereby strengthening identity representations. Extensive experiments on multiple public datasets demonstrate the superior performance of our PG-CMKD framework, provides new insights into leveraging structural information for robust cross-modal person re-identification. -
FEFusion: A Frequency-Domain Enhancement Method for Rainy Infrared and Visible Image Fusion
Jianlou Lou, Xinyu Sheng, Jianxun LouAbstractExisting infrared and visible image fusion methods lack resistance to adverse weather degradation and interference in complex rainy environments. In addition, these methods rarely integrate frequency-domain information effectively during feature extraction, failing to effectively separate degenerate components that exhibit significant differences in their frequency domain characteristics. To address these limitations, this paper proposes a frequency-domain enhanced fusion method (FEFusion) for infrared and visible images. First, this paper introduces a Histogram Transformer (Histoformer), which employs a binning-based self-attention mechanism to dynamically group pixel intensities into different ranges. This enables adaptive focusing on degraded regions and facilitates cross-modal feature association. Second, we design a Frequency Domain Enhancement Block (FDEB) to decompose high-frequency rain-streak noise and low-frequency fog-like effects in the frequency domain. Subsequently, a Low-light Enhancement Module (LEM) is applied to enhance details in dark regions, while a High-Low Frequency Interaction Module (H-LFIM) achieves cross-frequency feature modulation. Finally, the experimental results verify the effectiveness of the method proposed in this work. -
CQM: Algorithm for Video Moment Localization and Highlight Segment Detection Based on Conditional Query
Yude Wang, Xinyu Wang, Fei Song, Chuanxin Liu, Zongqiang LiuAbstractIn the task of video moment localization and highlight segment detection, it is crucial to understand the correlation between the video content and the query text because the video usually contains dynamic temporal information while the text provides the semantic and contextual descriptions. The effective fusion of the two can help the model capture the connection between events and the complete context more accurately. However, some existing methods tend to ignore this contextual integration, usually adopting simple feature splicing or global average pooling for information fusion, which cannot fully utilize the temporal characteristics in video and semantic details in text. In addition, some methods lack a flexible cross-modal alignment strategy when processing video and text, making the potential associations between the two not captured effectively. In order to solve the above problems, this paper proposes a conditional query-based algorithm for video moment localization and highlight segment detection (Conditional Query Model, CQM). The algorithm designs a conditional query decoder that enhances the performance of the conditional cross-attention mechanism by introducing conditional location query. Specifically, CQM introduces conditional location query during the decoding process, which enables each cross-attention head to focus on specific time intervals in the video clip, such as the start moment of a key event, the end moment, or a salient region within the clip. -
Attention Lattice Adapter: Visual Explanation Generation for Visual Foundation Models
Shinnosuke Hirano, Yuiga Wada, Tsumugi Iida, Komei SugiuraAbstractIn this study, we consider the problem of generating visual explanations in visual foundation models. Numerous methods have been proposed for this purpose; however, they often cannot be applied to complex models due to their lack of adaptability. To overcome these limitations, we propose a novel explanation generation method in visual foundation models that is aimed at both generating explanations and partially updating model parameters to enhance interpretability. Our approach introduces two novel mechanisms: Attention Lattice Adapter (ALA) and Alternating Epoch Architect (AEA). ALA mechanism simplifies the process by eliminating the need for manual layer selection, thus enhancing the model’s adaptability and interpretability. Moreover, the AEA mechanism, which updates ALA’s parameters every other epoch, effectively addresses the common issue of overly small attention regions. We evaluated our method on two benchmark datasets, CUB-200-2011 and ImageNet-S. Our results showed that our method outperformed the baseline methods in terms of mean intersection over union (IoU), insertion score, deletion score, and insertion-deletion score on both the CUB-200-2011 and ImageNet-S datasets. Notably, our best model achieved a 53.2-point improvement in mean IoU on the CUB-200-2011 dataset compared with the baselines. -
TRADNet: Temporal and Regional-Aware Diffusion Model for Point Cloud Generation
Yuanhao Yang, Jinlai Zhang, Yan Su, Ong Zhi Chao, Du Xu, Lairong Yin, Lin HuAbstractGenerating high-fidelity 3D point clouds is challenging, especially in preserving fine details and structural consistency. In this paper, we propose TRADNet, a Temporal and Regional-Aware Diffusion Network that enhances generation via temporal adaptation and spatial supervision. It comprises: a Timestep-Aware Feature Recalibration (TAFR) module to dynamically balance global-local features, a Detail-Aware Attention Fusion (DAAF) module using multi-scale convolution and attention to refine local structure, and a Region-wise Noise Loss supervising sub-region noise to improve local geometry. Experiments on ShapeNetV2 show TRADNet achieves state-of-the-art 1-NN accuracy across Chair, Airplane, and Car categories, the most reliable measure of generative quality. On the Chair class, TRADNet surpasses TIGER by 1.18% CD and 2.10% EMD, validating the effectiveness of integrating temporal adaptivity and regional supervision into diffusion models. -
Harmonizing Classification and Localization in Small Object Detection
Enhui Chai, Li Chen, Liu Wei, Tianxiang CuiAbstractIn the field of Small Object Detection (SOD), accurate classification and localization are crucial for detection performance. However, the inherent imbalance between classification and localization tasks can generate conflicting priorities, leading to suboptimal task coordination for small object detection. This imbalance is mainly caused by the different attention regions and the gradient competition between the two tasks during joint training. In this paper, we propose a Dual-Task Harmonization Framework (DTHF): First, we introduce a Feature Fusion-based Data Augmentation strategy (FF-DA), which amplifies boundary-aware patterns for localization while preserving critical semantic regions for classification, thereby aligning their region-of-interest priorities. Second, we design a Gradient Equilibrium Module (GEM) that dynamically balances tasks by altering the gradients, preventing one task from overwhelming the other during optimization. Experiments on the MS COCO and VisDrone datasets demonstrate that our method, compared to the baseline model, the experimental metrics of our method mAP in the VisDrone data set are improved by 2.0+%. Ablation studies validate that both FF-DA and GEM contribute synergistically, offering a unified solution to task imbalance in small object detection. -
Parallel Attention-Based Asymmetric Feature Decomposition and Recovery for Domain Generalization Person Re-identification
Hangyuan Yang, Yongfei Zhang, Siyu Chen, Shan Yang, Yanglin Pu, Yongjun WangAbstractSupervised Person Re-identification (ReID) suffers from severe performance degradation on unseen domains due to the domain gaps. To address this issue, we design a Domain Generalization (DG) ReID framework that is both generalizable and discriminative. In this framework, we propose a Parallel Attention-based Feature Decomposition and Recovery (PAFDR) module. PAFDR combines Batch Normalization (BN) and Instance Normalization (IN) to reduce the domain gap, but normalization inevitably removes discriminative information. We attempt to decompose identity-relevant features from the removed information and add them back to the network to enhance discrimination. However, existing methods only focus on the channel aspect and ignore spatial decomposition, leading to incomplete spatial decomposition of identity-relevant/irrelevant features. PAFDR employs parallel spatial and channel attention for a more thorough decomposition and recovery of identity-relevant features. Its parallel structure provides a regularization-like effect, improving generalization ability. Furthermore, existing loss functions use symmetric constraints, hindering thorough feature decomposition. We propose an Asymmetric identity-relevant Feature Decomposition (AIFD) loss that applies asymmetric constraints to features to match appropriate comparison objects, promoting thorough decomposition of identity-relevant/irrelevant features. Experiments show that our method outperforms existing DG ReID methods. -
Taylor-Augmented Transformer-Mamba Architecture for Egocentric Action Recognition
Xinyue Fan, Dandan Sun, Hailun Xia, Mingyu Mao, Junjiang LiuAbstractEgocentric action recognition aims to accurately model hand-object interactions. However, existing methods are highly susceptible to background noise and face challenges in balancing long-range and local feature modeling, alongside the high computational costs associated with processing high-resolution image sequences. To address these limitations, we propose a Taylor-Augmented Transformer-Mamba hybrid architecture(TATM). We first introduce Taylor Augmentation, a novel method based on Taylor frames, which employs a dynamic modality replacement strategy to generate a diverse training sample distribution, thereby enhancing model robustness against background interference. Additionally, we incorporate predicted object categories and decoded hand poses as part of the action recognition input, and design a MambaAction block adapted to Taylor-augmented data, which is integrated into the Transformer encoder. This hybrid framework enhances the modeling of hand-object interactions and effectively mitigates the trade-off between long-range dependency modeling and computational efficiency. Extensive experiments demonstrate that our approach doubles the inference speed for high-resolution images and achieves significant performance improvements on two skeleton-based egocentric action recognition benchmarks, FPHA and H2O. -
Learning Monocular Depth from Events via Egomotion Compensation
Haitao Meng, Chonghao Zhong, Sheng Tang, JunJia Lian, Wenwei Lin, Zhenshan Bing, Yi Chang, Gang Chen, Alois KnollAbstractEvent cameras are neuromorphically inspired sensors that sparsely and asynchronously report brightness changes. Their unique characteristics of high temporal resolution, high dynamic range, and low power consumption make them well-suited for addressing challenges in monocular depth estimation (e.g., high-speed or low-lighting conditions). However, existing methods primarily treat event streams as black-box learning systems, without incorporating prior physical principles. As a result, they become over-parameterized and fail to fully exploit the rich temporal information inherent in event camera data. To address this limitation, we incorporate physical motion principles and propose a high-accuracy monocular depth estimation framework, in which the likelihood of different depth hypotheses is explicitly determined by motion compensation based on the egomotion of the event camera. Specifically, we introduce a Focus Cost Discrimination (FCD) module that assesses edge sharpness as a key indicator of focus, while also integrating spatial context to facilitate more accurate cost estimation. Furthermore, we analyze the noise patterns within our framework and enhance it with a newly proposed Inter-Hypotheses Cost Aggregation (IHCA) module. This module refines the cost volume through trend prediction and multi-scale cost consistency constraints. Extensive experiments on real-world and synthetic datasets demonstrate that our framework achieves a 22% relative reduction in absolute relative error, highlighting its superior accuracy in monocular depth estimation. -
JP-Occ: 3D Occupancy Prediction with View Transformation Based on Joint Projection
Wei Liu, Guixing Xu, Zuotao Ning, Qi Guo, Junxin Jin, Biao Xu, Shuai Cheng, Liuyu PeiAbstractIn autonomous driving, vision-based 3D occupancy perception has received widespread attention for accurately representing 3D scenes compared to Bird’s-Eye-View (BEV) representations. A key step in vision-based 3D occupancy prediction involves view transformation, which converts 2D image features into a 3D voxel representation. Currently, the two most prominent view transformation paradigms are forward projection based on depth estimation and backward projection based on transformer. However, these two paradigms face inherent limitations. The former produces sparse voxel features, while the latter often focuses on unoccupied regions because of the absence of geometric prior information. To address these challenges, we propose JP-Occ, a joint forward-backward projection occupancy prediction framework that leverages the strengths of both paradigms to obtain high-quality voxel features. Our framework first employs an Occupancy-Aware Forward Projection module to generate initial voxel features and query proposals, followed by a Geometry-Guided Backward Projection module to enrich and densify the voxel features. In addition, we introduce a plug-and-play BEV Feature Aggregation module to enhance the global representation of voxel features. Experiments conducted on the Occ3D-nuScenes dataset demonstrate that JP-Occ achieves promising performance with high computational efficiency, underscoring the effectiveness and superiority of our approach. -
Advancing Audio-Visual Navigation Through Multi-Agent Collaboration in 3D Environments
Hailong Zhang, Yinfeng Yu, Liejun Wang, Fuchun Sun, Wendong ZhengAbstractIntelligent agents often require collaborative strategies to achieve complex tasks beyond individual capabilities in real-world scenarios. While existing audio-visual navigation (AVN) research mainly focuses on single-agent systems, their limitations emerge in dynamic 3D environments where rapid multi-agent coordination is critical, especially for time-sensitive applications like emergency response. This paper introduces MASTAVN (Multi-Agent Scalable Transformer Audio-Visual Navigation), a scalable framework enabling two agents to collaboratively localize and navigate toward an audio target in shared 3D environments. By integrating cross-agent communication protocols and joint audio-visual fusion mechanisms, MASTAVN enhances spatial reasoning and temporal synchronization. Through rigorous evaluation in photorealistic 3D simulators (Replica and Matterport3D), MASTAVN achieves significant reductions in task completion time and notable improvements in navigation success rates compared to single-agent and non-collaborative baselines. This highlights the essential role of spatiotemporal coordination in multi-agent systems. Our findings validate MASTAVN’s effectiveness in time-sensitive emergency scenarios and establish a paradigm for advancing scalable multi-agent embodied intelligence in complex 3D environments. -
EMambaEdgeNet: Featuring Feature Decoupling and Geometry-Aware Fusion with Mamba Architecture
Yuhan Duan, Lu Che, Xin Cheng, Zhiqiang Zhang, Wenxin YuAbstractRemote sensing image semantic segmentation, as the core technology of geographic information interpretation, achieves precise extraction of ground objects through pixel-level classification, and has important application value in fields such as land planning and environmental monitoring. However, existing methods generally face dual challenges of edge blurring and multi-scale target representation conflicts when processing high-resolution images: traditional convolutional networks are limited by local receptive fields leading to discontinuous boundaries, while mainstream Transformer architectures have difficulty balancing detail preservation and multi-scale modeling due to high computational complexity. To address these issues, this study proposes an edge-enhanced segmentation framework EMambaEdgeNet, innovatively constructing a three-stream feature decoupling architecture. Inheriting the global-local feature modeling advantages of RS3Mamba, it breaks through the perceptual bottleneck of traditional dual-branch models by introducing an edge feature enhancement branch. The core innovations of this method include: a multi-level feature erasure-residual reconstruction module based on reverse attention mechanism to enhance features in high-detail regions; a multi-directional geometric perception module combining dynamic separable convolution and rotation-invariant gradient operators to improve complex boundary direction discrimination; and a bidirectional cross-layer fusion mechanism with channel-spatial dual attention gating to promote dynamic complementarity of multi-scale features. Experiments on ISPRS Potsdam and Vaihingen datasets demonstrate the effectiveness of our proposed model in edge preservation and multi-scale target segmentation in complex scenarios. To our knowledge, this is the first three-stream feature decoupling model for remote sensing image semantic segmentation. -
Occluded Low-Layer Tree Detection Method Based on Dynamic Graph Convolutional Neural Network
Wei Lv, Wenxuan Jin, Wenyuan YingAbstractThe existing methods for single tree detection based on LiDAR point clouds have achieved extraordinary performance. However, the presence of occluded low-layer trees poses a challenge to existing methods for accurately detecting single trees. To improve the detection performance for low-layer trees, we present a method based on the dynamic graph convolutional neural network (DGCNN). At first, this method utilizes a graph-based algorithm to roughly determine the contour of the top crown on the canopy height model and expands it into circular regions. Each circular region is used as the detection sample. Then, the classifier based on DGCNN is used to classify the detection samples. The samples with low-layer crowns will be retained. Finally, to determine the position of the low-layer tree crowns, the statistical analysis method of point clouds is used to remove the point information from the top-layer tree crown. To verify the effectiveness of our method, the experiments are conducted based on two different experimental areas. The experimental results indicate that our method achieves the highest matching score compared with the other four methods. Specifically, our method can achieve an overall matching rate of 88.32% on five experimental plots containing the low-layer trees, with an overall commission rate of only 42.30%. The numbers and rates of omission generated by our method during detection are significantly lower than the compared methods. All the results have demonstrated that our method can effectively detect low-layer trees and exhibit universality across various scenarios. -
Distill4Geo: Streamlined Knowledge Transfer from Contrastive Weight-Sharing Teachers to Independent, Lightweight View Experts
Muhammad Haad Zahid, Murtaza TajAbstractCross-View Geo-Localization (CVGL) aims to align images from different perspectives (e.g., satellite and street views) to a shared geographic location—a complex task due to variations in viewpoint, intricate scene geometry, and visual discrepancies across views. Current methods commonly employ contrastive loss, which requires matching and non-matching (negative) pairs and often demands large batch sizes, leading to significant training overhead. This challenge is compounded in weight-sharing models which—while typically achieving better accuracy—incur high parameter and computational costs. We introduce a novel knowledge distillation approach that trains lightweight, view-specific student models without weight sharing. Optimized with a cosine embedding-based dual distillation loss, our method eliminates the need for large batch sizes. We also introduce augmentation noise to improve the student models’ pairwise generalization. Our approach reduces parameters by \(3\times \) and GFLOPs by over \(13.5\times \), achieving state-of-the-art (SOTA) accuracy on leading cross-view datasets, including CVUSA, CVACT, and VIGOR. -
PCSR: A Two-Stage Prompt-Guided and Semantic Refinement Framework for CLIP-Based Few-Shot Learning
Junying Zhong, Fufang LiAbstractVision-language models such as CLIP have demonstrated strong generalization in few-shot learning. Existing methods typically freeze CLIP’s visual encoder and rely on shallow classifiers and language models to generate class-level prompts for similarity matching. However, under extremely low-shot conditions, they face two main challenges: limited fine-grained discrimination and weak cross-modal interaction due to static feature matching. To address this, we propose a two-stage framework—Prompt-guided Contrastive Adaptation and Cross-modal Semantic Refinement (PCSR)—to enhance semantic alignment and generalization in few-shot vision-language tasks. In Stage I, we inject trainable dynamic prompts into CLIP’s visual and textual encoders. The model uses local image regions as input, guided by prompts to extract discriminative features, and learns via an image–text contrastive loss to adapt the joint embedding space. In Stage II, based on the frozen model adapted in Stage I, we introduce a cross-modal cross-attention mechanism. Local features are used to construct dynamic key-value pairs that drive collaborative modeling between global image representations and CLIP’s text classifier weights, thereby enhancing the model’s perception of fine-grained semantics. Experiments show that PCSR consistently outperforms existing methods on multiple few-shot benchmarks, demonstrating superior fine-grained recognition and cross-modal generalization. -
Semantic Segmentation at Extreme Distances for Maritime Computer Vision
Wolodymyr Krywonos, Angelo CangelosiAbstractScene understanding is a significant issue for maritime industries. Due to the large ranges involved, a model capable of localization at both short and long distances is particularly important. Currently, modern data sets have neglected the proper collection procedure and hardware to accommodate the scales of ranges at which small vessels appear. Due to other factors such as environmental challenges and data scarcity, current efficient deep learning techniques also fail to achieve high accuracies. We present High Resolution Collection Above Water (HR-CAW), a high-resolution dataset for realistic vessel segmentation, as well as addressing limitations with current data-driven approaches with a novel architecture, HC-PatchNet. Our results show that due to extreme data imbalance and diverse weather conditions, most methods for efficient high-resolution segmentation fail to converge, whereas the proposed method succeeds and achieves state-of-the-art performance. -
Semantic Importance-Based Deep Image Compression Using Diffusion Model
Xingyu Dai, Yuanyuan Xu, Kun ZhuAbstractSemantic image compression aims to reduce the amount of data transmitted by leveraging high-level semantic information for image representation and reconstruction. Considering that objects in an image vary in semantic importance, we propose a semantic-aware image compression framework that employs a diffusion model as the decoder to reconstruct visually pleasing images. The latent code generated by an autoencoder is used as a conditional input for the reverse diffusion process, where semantic importance is explicitly incorporated. To enhance the reconstruction of fine-grained details, we design a novel noise scheduling function tailored for conditional diffusion generation. Additionally, a two-stage reverse diffusion process with semantic-aware scheduling is proposed: The first stage focuses on improving the reconstruction quality of semantically critical regions, while the second stage refines the remaining areas. Experimental results demonstrate that the proposed method achieves higher perceptual quality compared to baseline approaches, along with improved pixel fidelity and semantic fidelity for semantically significant objects.
-
- Title
- Neural Information Processing
- Editors
-
Tadahiro Taniguchi
Chi Sing Andrew Leung
Tadashi Kozuno
Junichiro Yoshimoto
Mufti Mahmud
Maryam Doborjeh
Kenji Doya
- Copyright Year
- 2026
- Publisher
- Springer Nature Singapore
- Electronic ISBN
- 978-981-9540-97-6
- Print ISBN
- 978-981-9540-96-9
- DOI
- https://doi.org/10.1007/978-981-95-4097-6
PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.