Skip to main content

2025 | Buch

Intelligence Science V

6th IFIP TC 12 International Conference, ICIS 2024, Nanjing, China, October 25–28, 2024, Proceedings

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 6th IFIP TC 12 International Conference on Intelligence Science, ICIS 2024, held in Nanjing, China, in October 25-28, 2024.

The 23 full papers and 2 short papers presented here were carefully reviewed and selected from 32 submissions. These papers have been categorized into the following sections: Machine Learning; Causal Reasoning; Large Language Model; Intelligent Robot; Perceptual Intelligence; AI for Science; Medical Artificial Intelligence.

Inhaltsverzeichnis

Frontmatter

Machine Learning

Frontmatter
Difference-Enhanced Learning of the Deep Semantic Segmentation Networks for First Break Picking
Abstract
The precise estimation of seismic arrival times, commonly referred to as first-break picking, is a critical problem in seismic research due to its important role in various seismological applications such as statics correction processing. In recent years, there have been several deep learning algorithms designed specifically for 2D seismic arrival time picking. A widely used approach is to treat the 2D arrival picking problem as a 2D image segmentation problem and employ a deep semantic segmentation model for end-to-end first break picking. However, the first break mask generated from this method often fails to meet the uniqueness of first arrival time according to certain noises. In order to alleviate this problem, we propose a difference-enhanced learning method of the deep semantic segmentation network for the first break picking problem by designing a new kind of loss function, which actually improves the quality of mask generation and arrival time accuracy. It is demonstrated by extensive experiments on a real seismic dataset that our proposed difference-enhanced learning method is effective and outperforms the conventional learning methods for deep semantic segmentation models on the estimation of seismic arrival times.
Zhongyang Wen, Jinwen Ma
A Framework of Reinforcement Learning for Truncated Lévy Flight Exploratory
Abstract
Deep reinforcement learning (DRL) still explores insufficiently when dealing complex tasks with high-dimensional and large state spaces. Therefore, developing better exploration strategies is still one of the important tasks in reinforcement learning. The paper introduces a new exploration strategy ATLF (Adjustment of Truncated Lévy Flight exploration framework, ATLF) which augments the existing exploration mode with Lévy flight, making action selection more stochastic to boost exploration. The ALTF framework is combined with discrete-space algorithm DQN and continuous space-algorithm SAC to handle reinforcement learning tasks. Compared with a variety of reinforcement learning algorithms on OpenAI gym environments such as MountainCar-v0 and Walker2d-v2, the result shows that our algorithm has better exploration ability than vanilla DQN or SAC, obtaining higher overall rewards, and is less likely to fall into local optimization, and is more stable. Additionally, the result shows that the ALTF is highly compatible with existing deep reinforcement learning algorithms.
Quan Liu, Shile Feng, Zixian Gu
Detection of Depression in EEG Signals Based on Convolutional Transformer and Adaptive Transfer Learning
Abstract
Electroencephalography (EEG) signals provide an objective reflection of the inner workings of the brain, making them a promising tool for the diagnosis of depression. However, the classification of EEG signals for depression is severely affected by individual differences among subjects, complex intrinsic properties, and low Signal-to-Noise Ratio (SNR), which limits the classification accuracy. Additionally, traditional convolutional neural networks extract local features but fail to capture long-term dependencies in EEG decoding. To address the aforementioned issues, we introduce an adaptive transfer learning method based on a convolutional transformer model for depression detection. The experimental results demonstrate the effectiveness of the proposed model on the public MODMA dataset and EDRA dataset. The results indicate that the MODMA and EDRA datasets exhibit optimal accuracies of 100% and 98.61%, respectively, outperforming some state-of-the-art depression identification methods. Our findings provide new perspectives on the recognition of depression, which could be used as an assisted diagnostic tool in the future.
Qianqian Tan, Minmin Miao
Twin Bounded Least Squares Support Vector Regression
Abstract
Support Vector Machine (SVM) has received much attention in machine learning due to its profound theoretical research and practical application results. Support Vector Regression (SVR) has become a powerful tool for solving regression problems. Least Squares Support Vector Regression (LSSVR) has advantages in computing speed but can be prone to overfitting due to its sensitivity to noise and outliers. Additionally, Twin Support Vector Regression (TSVR) shows insufficient flexibility when dealing with large-scale data, and its robustness to noise could be improved. In response to these problems, this paper proposes an innovative solution: Twin Bounded Least Squares Support Vector Regression (TBLSSVR). This model combines the advantages of LSSVR and TSVR and introduces regularization terms to mitigate their limitations effectively. The regularization term helps minimize structural risk, reflecting the advantages of statistical learning theory and ensuring the stability of the solution. Experimental results demonstrate that TBLSSVR improves regression accuracy and significantly accelerates the solution speed, providing new directions and methods for future technology development.
Ran Chen, Muhan Liu, Jinwen Ma
MLEE: Event Extraction as Multi-label Classification Task at Token Level
Abstract
Event Extraction is an important task in natural language understanding, which aims to identify event trigger of pre-defined event types and their arguments of specific roles, has attracted a lot of attention from industry and academia. The previous works failed to address some issues, including error propagation problem, overlap and nest problem, and high complexity of model. This work proposes a novel model MLEE, which models Event Extraction task as Multi-Label classification task at token level, and processes the extraction task in a joint paradigm, can help solving issues mentioned above. The experiment verifies our model’s effectiveness. Empirical results on DuEE and FewFC shows that MLEE outperforms previous best model, pushing trigger extraction F1 to 85.03% (+4.45%), argument extraction F1 to 78.85% (+2.72%) on DuEE, pushing trigger extraction F1 to 76.69% (+1.63%), argument extraction F1 to 76.53% (+5.27%) on FewFC.
Jinshun Yang, Shuangxi Huang, Mingfeng Huang
Research on Improvement of Sweeping Learning Chain Algorithm Based on Factor Space Theory
Abstract
In order to improve the classification accuracy of sweeping learning chain (SLC) algorithm in factor space and solve the problem that SLC is error- prone to classify samples in mixed domain, this paper proposes sweeping learning chain-K-Nearest Neighbor (SLC-KNN) algorithm. When SLC encounters the problem of undivisible data, KNN is used to classify the samples to be tested that fall into the mixed domain. It not only solves the problem of undivisible data encountered by the SLC, but also reduces the amount of computation and storage of KNN algorithm. And extended to multi-classification problem, proposed BT- SLC-KNN multi-classification algorithm. According to the maximum class center distance, firstly the algorithm makes the two most easy to separate classes separate. And defines the Class center distance on sweeping vector, a normal binary tree is generated step by step by comparing the distance of the remaining categories on the sweeping vector of the two classes with the furthest distance. It reduces the time complexity of merge sweeping learning chain (MSLC) algorithm and improves the multi-classification accuracy. Finally, the experiments are carried out on UCI data sets, and the results show that the two algorithms are feasible and effective.
Yaru Liu, Fanhui Zeng, Sihang Ren
End-To-End Control of a Quadrotor Using Gaussian Ensemble Model-Based Reinforcement Learning
Abstract
In recent years, the rapid development of deep reinforcement learning has provided a new way to solve the robot control problems. However, the low sample efficiency and slow convergence speed of deep reinforcement learning have become one of the obstacles when transitioning from simulation to the real world. In this paper, we propose a quadrotor control policy using Gaussian ensemble model-based reinforcement learning. Unlike traditional control methods, this method uses an actor-critic deep neural network which is updated with a reward function to achieve end-to-end control of the quadrotor by establishing a mapping between the quadrotor's states and motor control signals. Additionally, we improve sample efficiency by constructing an ensemble model following a Gaussian normal distribution, which differs from conventional model-free RL methods. The environment model is trained using data from the agent's interaction with the real environment and reduces the number of interactions with the real environment by generating simulated data. The approach is evaluated in the AirSim which is a high-fidelity visual and physical simulator. The results show that the proposed approach improves the sample efficiency, eliminates oscillations and steady error, and demonstrates robustness to external disturbances.
Qiwen Zheng, Qingyuan Xia, Haonan Luo, Bohai Deng, Shengwei Li

Causal Reasoning

Frontmatter
Research on the Causal Forest Algorithm Based on Factor Space Theory
Abstract
In order to solve the problems in classification, improve the classification accuracy and generalization ability of a single classifier, based on the degree of determination of factor space theory, a causal tree algorithm is proposed; Ensemble multiple causal trees, provide a definition of factor importance, obtain advantage factors, set importance thresholds in numerical experiments to reduce the set of condition factors, randomly select factors and then obtain the causal forest algorithm. Experimental comparison was conducted on the classification datasets of the UCI database to comprehensively evaluate the causal tree algorithm, causal forest algorithm, SVM algorithm and random forest algorithm. The experimental results showed that the causal forest algorithm performed well in terms of accuracy, precision, recall, F1 value, and AUC indexes, especially on the Vote and Cancer datasets, with good predictive and generalization abilities. The research conclusion expands the theoretical and applied research of factor space in data mining.
Fanhui Zeng, Kaile Lin, Xiaotong Liu, Pengxue Zhang, Sixing Ren
Superpositioner – A Non-logical Computation Model
Abstract
We have been striving to exceed computational complexity, and in the process, we have come to realize the dilemma of classical computing, and in turn we realize that superpositioner may be a way to solve. A superpositioner is a model formed by several Boolean functions that their variables and function values are feeding back to each other. The component of the superpositioner is the reentry function, which can be fully described by classical logic and can be calculated by classical computation, but the superpositioner as a whole is a non-logical entity, and it is impossible for classical computation to fully compute it. In this article, we present the concept of a superpositioner and discuss its basic properties. We find that the superpositioner + dispositioner will form a new type of computation model whose capabilities can surpass Turing computation. We envision that this new model will help implement these functions in the intelligent agent: a whole new way of programming and a whole new way of learning, endogenous feelings, analogies and associations, forming understanding, dynamic action, participating in the formation of subjectivity, and more. We will also discuss how to implement superpositioner in the most preliminary way.
Chuyu Xiong
Factor Analog Reasoning Model and Its Solution Research
Abstract
Analogical reasoning is one of the most common forms of thinking that people use existing knowledge to reason, and it is a key phenomenon of human intelligence. However, when analogical reasoning is faced with simple reasoning tasks, the result is not accurate. Computationally complex problems arise when faced with complex inference tasks in big data. In order to solve the problem of analogical reasoning, based on the [U, I] image matching principle in factor space theory, this paper proposes a factor analogical reasoning model, gives the steps of factor analogical reasoning algorithm, analyzes an example in UCI data set, and compares and analyzes the factor analogical reasoning algorithm proposed in this paper with the factor analysis algorithm. The results show that the factor analogy reasoning algorithm proposed in this paper can realize the effective reasoning of the analogy reasoning problem, and the algorithm has the advantages of accurate calculation results and short calculation time. The conclusion of analogical reasoning based on factor space expands the theory and application of factor space.
Tianyuan Wang, Fanhui Zeng, Sixing Ren
Research on Factor Support Vector Multi-classification Algorithm Based on Factor Space Theory
Abstract
In order to solve the accuracy and complexity of multiple classification algorithms, on the basis of factor space theory, the divisibility measurement condition and the construction condition of binary tree are defined. On the basis of balanced binary tree, combining the relationship between class spacing and sample circle radius, a factor support vector multiple classification algorithm(M-FSV) is proposed by using recursion idea and the principle of “easy classification first”. Experiments are done with one-to-one support vector machine and balanced binary tree support vector machine algorithm, and experimental comparison is made. The results of 8 data sets in UCI database show that the training time of M-FSV is less than that of SVM, and the algorithm accuracy is higher than that of SVM. The research results expand the theory and application of factor space, and provide a new idea and simple method for classification problems in machine learning.
Kaijie Zhang, Fanhui Zeng, Jiaxin Li

Large Language Model

Frontmatter
Improve LLM Inference Performance with Matrix Decomposition Strategies
Abstract
Large Language Models (LLMs) are highly effective in various applications but are often limited by their performance (both efficiency and accuracy) during the inference stage. This paper introduces a novel compression technique that leverages Singular Value Decomposition (SVD) and Non-negative Matrix Factorization (NMF) within the MLP layers of transformer-based LLMs. By incorporating adaptive batch sizing and various initialization methods, our method significantly enhances the inference efficiency of these models without compromising their accuracy. We present empirical evidence showing that our method improves both model efficiency and accuracy during inference stage. Specifically, with SVD decomposition, we achieve a 1.6x speedup in inference tokens processing while retaining over 95% of the original model’s accuracy. Additionally, through NMF decomposition, we observe up to a 7% improvement in model accuracy compared with the original model, while maintaining or slightly enhancing tokens processing efficiency. These observations suggest that different matrix decomposition techniques can be strategically employed depending on the application requirements-SVD decomposition to boost efficiency, and NMF decomposition to enhance accuracy.
Jiyuan Shi, Chunqi Shi

Intelligent Robot

Frontmatter
Trajectory Prediction of Unmanned Surface Vehicle Based on Improved Transformer
Abstract
In recent years, the significance of applying Unmanned Surface Vehicles (USVs) in coastal defense has been progressively increasing. Precise prediction of USV trajectories plays a vital role in the decision-making of coastal defense, anti-privacy, and so on. However, the intricate nature of USV trajectories, characterized by high maneuverability and sudden motion pattern changes, poses great challenges for accurate prediction. To address these issues, this paper proposes a trajectory prediction model based on an improved Transformer with sparse self-attention and physical rule constraints. Focus on designing the “Max-Mean” sparse self-attention mechanism to streamline computational demands and memory usage, and the physical loss function to improve the accuracy and robustness of predictions. Moreover, a generative decoder is included to improve the model’s ability to process long sequence data and the inference efficiency. To verify the prediction effect of the proposed method, we construct a USV simulation trajectory dataset based on the ship kinematic model for trajectory prediction experiments. The simulation results illustrate that the proposed model surpasses existing trajectory prediction models and fulfills the stringent requirements for precise and rapid USV trajectory predictions.
Zhipeng Cheng, Jian Yu, Junyu Chen, Jihuan Ren, Xiang Wu
Deep Neural Network Based Relocalization of Mobile Robot in Visual SLAM
Abstract
Relocalization after track loss in Visual Simultaneous Localization and Mapping (Visual SLAM) is a critical challenge in robotics, especially for autonomous mobile robots navigating dynamic environments. This paper introduces a deep learning approach that employs Deep Neural Networks (DNNs), particularly VGG16 and ResNet34, to reorient and relocalize robots effectively. Trained on a vast repository of indoor images from the Multimodal Indoor Simulator (MINOS) and Matterport3D dataset, the DNN models discern the most viable direction for movement be it translation or rotation based on the robot’s current visual input in relation to its last known position within the ORB-SLAM2 generated map. The methodology involves real-time data exchange between MINOS and the ORB-SLAM2 system via a dedicated ROS node, facilitating the recovery process. Extensive testing shows that our proposed model successfully predicts the appropriate recovery action in over 90% of track loss instances, substantiating its efficacy and potential for deployment in real-world applications. This research contributes to the advancement of robust relocalization strategies in Visual SLAM, enhancing the autonomous capabilities of mobile robotics.
Azhar Muhammad Hamza, Chaoxia Shi, Yanqing Wang
A Vision-Based Method for UAV Autonomous Landing Area Detection
Abstract
Automatic identification of the landing area is crucial for UAV (Unmanned Aerial Vehicles) to land correctly and safely. Using passive vision sensors to achieve this objective is a very promising avenue due to their low cost and the potential they provide for performing simultaneous terrain analysis. In this paper, a computer vision method is proposed using an improved U-Net based architecture on UAV imagery to assess the safe landing area. Contrary to past methods, which little attention has been paid to the whole landing process with multiple descending altitude, experiment involves evaluating the landing area by analyzing visual images obtained from different descending heights. In the initial stage of landing, separation was made between water and land, guiding the flight to the land. As descending, classification was applied on images captured near ground into safe/unsafe landing areas and then mapped to the safety score for selection. Experiments on public datasets have shown promising results.
Qiutong Zhang, Qingyuan Xia, Lisheng Wei, Bohai Deng

Perceptual Intelligence

Frontmatter
Research on Object Detection for Intelligent Sensing of Navigation Mark in Yangtze River
Abstract
The maintenance and management of navigation marks are essential for ensuring the safety of transportation on the Yangtze River. Considering the current inspection and management approaches, this paper introduces an intelligent method for inspecting inland river navigation marks using Unmanned Aerial Vehicles (UAVs). The method enables real-time monitoring of navigation marks using UAV video inspection. A UAV data acquisition platform captures video images of these marks. We have developed the ED-YOLOv5s object detection algorithm to detect and classify navigation marks. Building on this, the system can automatically assess the light quality and status of navigation marks at night. The ED-YOLOv5s algorithm is an enhancement of the YOLOv5s model, incorporating the ECA mechanism and DFFN structure, which are based on the ResNet principle. This modification enhances the model’s capability for network feature fusion. Experimental results indicate improvements in navigation mark detection with the ED-YOLOv5s. Although precision decreased by 1.76% when compared to the YOLOv5s model, recall and mAP@0.5 increased by 3.59% and 2.97%, respectively. The detection results for light quality state from video images of navigation marks at night accurately reflect actual conditions. We have developed an intelligent sensing scheme for navigation marks on the Yangtze River based on the improved model. This scheme has been implemented in the Yichang section of the Yangtze River, significantly reducing the cost of daily inspections, enhancing cruise monitoring effectiveness, facilitating intelligent maintenance decisions for navigation marks, and further ensuring the navigational safety of the Yangtze River.
Taotao He, Pinfu Yang, Xiaofeng Zou, Shengli Zhang, Shuqing Cao, Chaohua Gan
Cascaded Sliding-Window-Based Relativistic GAN Fusion for Perceptual and Consistent Video Super-Resolution
Abstract
Perceptual video super-resolution aims at converting low-resolution videos to visually appealing high-resolution ones. It may lead to temporal inconsistency due to the drastically changing outputs. In this paper, we propose cascaded sliding-window-based relativistic GAN (Generative Adversarial Network) fusion for perceptual and consistent video super-resolution (PC-VSR). Firstly, cascaded sliding-window-based relativistic GAN is designed to extract more useful information. It enlarges the temporal receptive field of sliding-window-based model in each step. It is able to enhance perceptual quality and compensate temporal consistency progressively and sufficiently. The trained separate refinement generator networks are fused into a final refinement generator. The final refinement generator can be calculated recursively at the testing stage. With our generator fusion, the parameter number is reduced and good quality is maintained. Extensive experimental results demonstrate that our approach outperforms state-of-the-art super-resolution methods in terms of perceptual quality. Our method also achieves good temporal consistency and per-pixel accuracy, compared with other perceptual approaches.
Dingyi Li
Integration of Raman Spectroscopy, On-Line Microscopic Imaging and Deep Learning-Based Image Analysis for Real-Time Monitoring of Cell Culture Process
Abstract
Traditionally, condition monitoring of mammalian cell culture processes is based on sampling and off-line analysis, which is labour intensive, time consuming, and causes time delays. In this work, in situ microscope and on-line Raman spectroscopy are investigated for simultaneous measurement of multiple properties of the cell growth state and biochemical indices of suspended animal cells. The focus is on investigation of deep learning-based Mask R-CNN algorithm for image analysis. The model is trained by 184 images with 183,040 cells using data augmentation methods and transfer learning technique. Mask R-CNN segments the clustered cells more effectively than the conventional one combining edge detection, intensity thresholding, and advanced watershed method. The evolution of geometrical features of cells is further analyzed, including equivalent diameter, circularity, and aspect ratio. It demonstrates the great potential of deep learning in the analysis of on-line images for control of the cell culture process.
Xiaoli Wang, Guangzheng Zhou, Xue Zhong Wang
DRL-SLAM: Enhanced Object Detection Fusion with Improved YOLOv8
Abstract
Simultaneous Localization and Mapping (SLAM) technologies are pivotal in advancing robotics and autonomous navigation, particularly within challenging indoor environments. These systems face significant challenges due to dynamic variables such as fluctuating lighting, occlusions, and the presence of moving objects. In response, this paper introduces a novel integration of Deep Reinforcement Learning (DRL) with the advanced object detection capabilities of YOLOv8 within a SLAM framework, termed DRL-SLAM YOLOv8. This integration enhances object detection by leveraging DRL’s ability to learn from environmental interactions and YOLOv8’s precise and efficient real-time object recognition. Our experiments demonstrate the superiority of DRL-SLAM YOLOv8 over traditional methods, with marked improvements in detection accuracy and system reliability under diverse conditions. Notably, our approach significantly improves navigational effectiveness, as evidenced by increased distance coverage and goal achievement compared to standard SLAM techniques, validating its potential in real-world applications.
Farooq Usman, Chaoxia Shi, Yanqing Wang
Driver Fatigue Recognition Based on EEG Signal and Semi-supervised Learning
Abstract
Driving fatigue has become a serious hidden danger to road traffic safety. Drivers in a fatigued state often have problems such as delayed reactions and lack of concentration, which increases the risk of traffic accidents. In fatigued driving, the brain activity of the driver undergoes a series of changes, such as a decrease in the frequency of brain waves and a decrease in the amplitude of electroencephalogram (EEG) signals. Therefore, we propose a novel Semi-supervised Label Propagation with Optimal Graph Learning (SOGL) model that for identifying the fatigue state of drivers. This model uses class information from a small amount of labeled EEG data to assist the learning of unlabeled data and uses soft projection matrix learning to handle non-linear data structures. In addition, we also introduce a partially labeled graph learning method that extracts potential data structure information through graph structure learning techniques to improve the robustness and generalization ability of the model. Experimental results show that the model has good performance on a driving fatigue dataset.
Lin Chen, Xiaobo Chen
SC-EcapaTdnn: ECAPA-TDNN with Separable Convolutional for Speaker Recognition
Abstract
The efficacy of time-delay neural networks (TDNN) in speaker recognition has been demonstrated. ECAPA-TDNN builds on TDNN, improving performance levels at the cost of increased computational complexity and slower inference speed. However, the effectiveness of ECAPA-TDNN does not meet the expected standards for speaker recognition in complex scenarios. This motivates us to seek an architecture superior to ECAPA-TDNN. In this paper, we propose an efficient network called SC-EcapaTdnn, which is a fusion of separable convolutions and ECAPA-TDNN. This innovative design uses ECAPA-TDNN as the backbone, uses depth-separable convolution blocks to encode acoustic features, and generates high-resolution frequency feature maps, allowing the backbone model to obtain more refined and effective speaker features. At the same time, we replace the squeeze excitation (SE) module in ECAPA-TDNN with adaptive one-dimensional convolution to generate channel attention weights to extract inter-channel dependencies. Ultimately, we trade a small increase in model parameters for a significant increase in performance. Training on the AISHELL and CN-Celeb datasets shows that our proposed architecture outperforms other mainstream speaker recognition systems.
Erhua Zhang, Yifan Wu, Zhenmin Tang

AI for Science

Frontmatter
Evolving Financial Markets: The Impact and Efficiency of AI-Driven Trading Strategies
Abstract
This paper investigates the impact of Artificial Intelligence (AI) on trading strategies in financial markets, comparing AI-driven approaches with traditional methodologies and their effects on market efficiency, liquidity, and volatility. It critically examines how AI challenges established financial theories, such as the Efficient Market Hypothesis and behavioral finance, suggesting a potential redefinition of market dynamics considering AI’s superior data processing and analytical capabilities. The study synthesizes academic literature, theoretical insights, and speculative analysis to assess the multifaceted implications of AI’s integration into trading practices. Findings highlight AI-driven strategies’ enhanced risk-adjusted returns and contribution to market efficiency, yet underscore a complex impact on market dynamics, where AI can both improve liquidity and introduce volatility. Ethical considerations and regulatory challenges are emphasized, pointing to the need for transparent and adaptive regulatory frameworks to address the opacity of AI decision-making and ensure market integrity. The paper advocates for interdisciplinary research and collaboration among technologists, regulators, and market participants to navigate the evolving landscape of AI in trading. Through this exploration, the study contributes to the discourse on AI-driven trading, balancing the benefits of technological advancements against the risks and challenges, and underscores the critical role of regulatory oversight in shaping the future of financial markets.
Zhiyi Liu, Kai Zhang, Deyu Miao
DSFM Method: A New Approach to Enhancing Discrimination Ability on AI-Generated Datasets
Abstract
In recent years, generative large models have achieved remarkable progress, attracting widespread attention. With the rapid development of applications based on these models, public interest in creativity has significantly increased. Generated images have become prevalent in mainstream media and social networks, covering a wide range of topics and domains. Although these technologies have offered unprecedented opportunities for numerous industries, they also come with potential issues such as copyright infringement and information forgery. Existing models for detecting synthetic images typically suffer from low accuracy and weak generalization capabilities. To address these issues, we have proposed a novel method named DSFM. This method utilizes a combination of ResNet and Vision Transformer to simultaneously focus on shallow and deep information, thereby enhancing the overall performance of the model. Experiments conducted on four datasets-AGI, MMAF, Gide COCO, and SFHQ LSUN-demonstrate that our model significantly outperforms baseline models on the MMAF and Gide COCO datasets, with performance improvements nearing 30% on the MMAF dataset. Although improvements on the AGI dataset were modest, the model still displayed competitive performance. Under multiple evaluation metrics, this method has proven its excellent accuracy and superior generalization capabilities. The related datasets and code have been made publicly available and can be accessed at https://​github.​com/​veinhao/​ for further information.
Bin Wang, Wenhao Wang, Pingping Wang, Jinyu Cong, Jian Wang, Benzheng Wei

Medical Artificial Intelligence

Frontmatter
Enhancing Weakly Supervised Medical Segmentation via Heterogeneous Co-training with Box-Wise Augmentation and Pseudo-Label Filtering
Abstract
In this paper, we introduce an innovative approach to weakly supervised medical image segmentation with box annotations. Different from the previous methods which simply utilize a single conventional network with the same augmentation techniques widely used in supervised segmentation, we aim to introduce diverse augmentations and heterogenous networks to leverage the box annotations for promising generalization ability. Specifically, to amplify the diversity between the contents within the box and its surroundings, we propose the interior and exterior box augmentation (IEBA) technique, in which distinct augmentation techniques are employed for regions inside and outside the bounding boxes. Also, for the purpose of selecting pseudo-labels of superior quality, we propose the pseudo-label filter module (PLFM) to eliminate unreliable pseudo-labels. Besides, as CNN demonstrates superior capabilities in acquiring local information, and ViT specializes in capturing global context, we facilitate a bidirectional learning process between CNN and ViT through quadruple cross consistency losses (QCCL). In inference, we only employ the superior model from the validation set to obtain parameter efficiency. Our approach is evaluated across four tasks on two public datasets, utilizing the 3D dice similarity coefficient as the evaluation metric. The experimental results show that the proposed method outperforms the state-of-the-art comparison methods.
You Wang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao
FCGA-Former: A Hybrid Factor Space Classification Model for Predicting the Tumor Mutation Burden of Lung Adenocarcinoma
Abstract
Lung cancer is a malignant tumor with the highest mutation rate in the world, among which non-small cell lung cancer (NSCLC) has a very high mutation rate. In recent years, medical research has found that tumor mutation burden (TMB) can predict treatment of cancer, immunotherapy and chemotherapy. However, the traditional method of calculating TMB using gene prediction technology has the disadvantages of high detection cost, long cycle, and intensive sample interrogation. To solve the above problems, we propose a hybrid deep learning model (FCGA-Former) to automatically predict TMB predictions, aiming to save pathologists’ time. In order to solve the problem of semantic gap in medical images, the factor space is applied to the semantic embedding space, so that the high-level semantic space is consistent with the underlying image feature space, and the data features are directly related to the information expressed by the images. The lung adenocarcinoma histopathology image dataset was taken from the TCGA database and included 271 high TMB data and 66 low TMB data. Experimental results show that the maximum average area under the curve (AUC) of this model is 98.1%. FCGA-Former is discussed in the area of ​​other models in terms of interpretability as it provides more accurate results. The results of this study are of great significance in guiding lymph node treatment of NSCLC. This lays the foundation for deploying automatic classification decision-making systems in clinical applications and using deep learning technology to predict TMB.
Ziang Cai, Han Zhang, Ziyi Yang, Xiaoyan Zhang
Backmatter
Metadaten
Titel
Intelligence Science V
herausgegeben von
Zhongzhi Shi
Michael Witbrock
Qi Tian
Copyright-Jahr
2025
Electronic ISBN
978-3-031-71253-1
Print ISBN
978-3-031-71252-4
DOI
https://doi.org/10.1007/978-3-031-71253-1