Zum Inhalt

Multi-disciplinary Trends in Artificial Intelligence

18th International Conference, MIWAI 2025, Ho Chi Minh City, Vietnam, December 3–5, 2025, Proceedings, Part II

  • 2026
  • Buch
insite
SUCHEN

Über dieses Buch

Diese dreibändige Sammlung stellt die Vortragsreihe der 18. Internationalen Konferenz über multidisziplinäre Trends in der künstlichen Intelligenz MIWAI 2025 dar, die vom 3. bis 5. Dezember 2025 in Ho-Chi-Minh-Stadt, Vietnam, stattfand. Die 110 vollständigen Beiträge, die in diesem Verfahren präsentiert wurden, wurden sorgfältig geprüft und aus 306 Einreichungen ausgewählt. Die Vorträge konzentrieren sich auf verschiedene Themen der KI und ihrer Anwendungen wie Deep Learning, maschinelles Lernen, Computervision, Mustererkennung und Verarbeitung natürlicher Sprache.

Inhaltsverzeichnis

Frontmatter
Solving the Student-Project Allocation Problem with Preferences over Projects Using a Multi-start Local Search

The Student-Project Allocation problem with Preferences over Projects is a well-studied many-to-one stable matching problem with practical relevance in academic assignment systems. Finding a maximum stable matching that assigns the largest possible number of students to their acceptable projects is known to be NP-hard. In this paper, we introduce a multi-start local search algorithm to tackle this challenge, where each local search begins with a random matching and iteratively eliminates undominated blocking pairs to improve its stability. When a local search reaches a stable but incomplete matching, it applies a perturbation strategy to reassign unallocated students based on their preferences, thereby encouraging further improvements. The algorithm repeats this process across multiple local searches and returns the largest stable matching found. Experimental results demonstrate that our approach efficiently achieves large stable matchings within a reasonable computational time, even for large-scale instances.

Le Quoc Anh, Nguyen Nhu Son, Son Thanh Cao, Hoang Huu Viet
A Small Language Model and Domain-Specific Resources for Vietnamese Public Services

Recent advancements in large language models (LLMs) have enabled impressive capabilities across general-purpose tasks. However, applying these models to low-resource domains such as public administrative services in Vietnam remains a challenge due to data scarcity, domain complexity, and computational constraints. To address this, we present VietPSLM (Vietnamese Public Service Language Model), a compact, instruction-following language model fine-tuned for Vietnamese public service question answering. Our approach includes domain-adaptive pretraining, supervised fine-tuning, and a two-pass inference strategy to improve clarity and factual accuracy. Alongside VietPSLM, we release a suite of domain-specific datasets, including an unlabeled corpus for pretraining, a factual QA dataset, and two evaluation benchmarks. Despite its small size, VietPSLM delivers competitive performance, approaching the accuracy of larger proprietary systems such as Gemini 2.0 Flash. These results highlight that targeted adaptation and high-quality data can enable lightweight models to perform effectively in real-world governmental settings.

Van Thai Le, Anh-Cuong Le
Multimodal Tree Crown Detection and Carbon Stock Estimation from Remote Sensing Imagery

Accurate estimation of forest carbon stocks from remote-sensing imagery is critical for climate monitoring and ecosystem management. We propose HAF R-CNN (Height-perceptual Attention Fusion R-CNN), a Faster R-CNN extension that fuses RGB (red-green-blue) and canopy height model (CHM) features via multi-level cross-attention to improve tree crown detection. Unlike previously proposed RGB-CHM fusion approaches that rely mainly on simple concatenation or late fusion, our cross-attention design enables richer bidirectional structural-spectral interactions. We further develop a tree-level carbon estimation pipeline combining crown structural descriptors and vegetation indices with a Random Forest regressor. On the NEON dataset, HAF R-CNN outperforms baselines in detection, and the pipeline achieves reliable carbon estimation at the OSBS site ( $$R^2$$ R 2 = 0.67, MAE = 59.6 kg), though performance decreases in denser forests (MLBS). These results highlight both the promise and current limitations of multimodal detection for scalable carbon monitoring.

Manh-Tan Doan, Quoc-Ngoc Ly
A Classificatory Topos: Refining Evolving Knowledge in Multi-agent Learning Systems

We propose a Classificatory Topos, a mathematical framework to model the dynamic evolution of knowledge within a finite system of interacting learning machines. Following guidelines of category theory, the construction establishes a Grothendieck topos, $$\text {Sh}(C_{\text {learn}}, J)$$ Sh ( C learn , J ) , as a mathematical universe for this problem domain. By defining a base site on a category of epistemic states with causal morphisms, and equipping it with a Grothendieck topology that formalizes a logic of justification, the framework provides a rich, non-linear model of system evolution. The use of sheaves ensures causal consistency, while the internal logic of the topos, governed by a subobject classifier, provides the machinery to trace, verify, and explain the refinement of classifications.

Manuel Hernández, Eduardo Sánchez-Soto, C.H. Castañeda-Roldán
A Survey of Algorithmic and Contextual Decomposition Methods Across Language Model Pipelines

As language models continue to scale in parameter count and reasoning complexity, achieving efficient adaptation and interpretable inference has become a pressing challenge. Decomposition offers a versatile paradigm that addresses both concerns, enabling structural reduction during model training and logical segmentation during inference. This survey presents a comprehensive analysis of 29 decomposition methods categorized into algorithmic and contextual types. We examine how these methods restructure weight matrices, control update granularity, and decompose information content to support downstream tasks. Through extensive interpretations for each of these methods, we highlight the growing role of decomposition as both a mathematical and reasoning tool. Our findings offer a structured reference for advancing efficiency, modularity, and interpretability in large language models, with implications for research in model training, algorithm optimization, and knowledge-grounded inference. This work targets researchers and practitioners seeking scalable solutions across NLP domains where resource constraints and reasoning depth demand more than traditional modeling and inference pipelines.

Mengyao Zhu, Phuc Huu Nguyen
Uncertainty Quantification for Flood Forecasting in Small Catchments

The quantification of uncertainty in flood forecasting, especially in small catchments, presents significant challenges due to the inherent uncertainty in precipitation now- and forecasts. Several authors have investigated uncertainty quantification for time series forecasting in general and for flood forecasting specifically. To our knowledge, none have focused on small catchments, tidal influences, large hourly forecast horizons, or the incorporation of classic forecasting tools such as differencing. We implement and evaluate approaches using Conformal Prediction, Monte Carlo Dropout, Ensembles and direct distribution forecasting to quantify the uncertainty of LSTM networks trained to predict the change in water level for the next 48 h in three catchments in Northern Germany. We analyze the performance of the different approaches regarding the width and accuracy of the prediction intervals. Our study shows that the incorporation of differencing strongly influences which uncertainty quantification methods are suitable, with direct distribution forecasting ignoring correlations between the forecasting steps and Conformal Prediction being the most suitable for our specific datasets.

Michel Spils, Sven Tomforde
Efficient Learning of Horn Formulas over Finite Totally Ordered Domains

One of the core activities in learning is the synthesis of concepts and their relationships by generalizing from positive and negative samples. Information hidden in data becomes explicit, relations emerge that provide insights and serve as explanations. Nowadays, machine learning is able to process large quantities of data and to build models that classify new data with such a success that the algorithms are termed ‘intelligent’. With most approaches, though, the models are black boxes: It is usually difficult to explain why a model classifies data in a particular way.Our work is an approach to machine learning that provides explanations. We present algorithms that construct if-then rules (Horn clauses) from samples and counter-samples. We generalize results for binary data to finite totally-ordered domains by relying on literals of the form $$x\ge d$$ x ≥ d and $$x\le d$$ x ≤ d , with x and d being domain variables and constants, respectively. This way we avoid the need to binarize data over such domains, which usually entails an increase in variables and output that is hard to interpret. We present both an offline and an online algorithm. In the first case, the positive and negative samples are known from the outset, while in the second case the samples arrive one by one and lead to incremental changes to the formula.Both algorithms are linear in the number of positive and negative samples as well as in the number of variables, while the size of the resulting formula does not depend on the number of positive samples. Besides analyzing the asymptotic complexity, we use a C++ implementation to evaluate the algorithms on some datasets. We conclude with a discussion of various extensions.

Miki Hermann, Gernot Salzer
Knowledge-Enhanced Vietnamese Paraphrase Identification

Paraphrase identification (PI) is a fundamental task in natural language processing (NLP) that determines whether a pair of sentences convey the same meaning. This task plays a crucial role in various applications such as machine translation, computer-assisted translation, and question answering. While extensive research has been conducted in English and several other languages, Vietnamese PI remains relatively underexplored. Pretrained language models (PLMs) have become the standard approach for tackling language understanding tasks, including PI. However, despite their rapid advancement, these models are still limited in their capacity to capture external knowledge. In this study, we propose a novel architecture that integrates PLMs with external knowledge for Vietnamese PI. Experimental results show that our approach, using mBERT as a base model, achieved an F1-score of 95.59% on a combined corpus consisting of vnPara and an additional 1498 sentence pairs enriched with diverse entities. This highlights the effectiveness of our approach in distinguishing named entities and understanding external knowledge.

Minh Lu Xuan, Minh Nguyen Hong, Loc Nguyen Xuan, Thai Do Thanh, Duc Bui Tien, Quang Tran Minh
Co-NAML-LSTUR: A Combined Model with Attentive Multi-view Learning and Long-and Short-Term User Representations for News Recommendation

News recommendation systems play a critical role in alleviating information overload by delivering personalized content. A key challenge lies in jointly modeling multi-view representations of news articles and capturing the dynamic, dual-scale nature of user interests-encompassing both short- and long-term preferences. Prior methods often rely on single-view features or insufficiently model user behavior across time. In this work, we introduce Co-NAML-LSTUR, a hybrid news recommendation framework that integrates NAML for attentive multi-view news encoding and LSTUR for hierarchical user modeling, designed for training on limited data resources. Our approach leverages BERT-based embeddings to enhance semantic representation. We evaluate Co-NAML-LSTUR on two widely used benchmarks, MIND-small and MIND-large. Results show that our model significantly outperforms strong baselines, achieving improvements over NRMS by 1.55% in AUC and 1.15% in MRR, and over NAML by 2.45% in AUC and 1.71% in MRR. These findings highlight the effectiveness of our efficiency-focused hybrid model, which combines multi-view news modeling with dual-scale user representations for practical, resource-limited resources rather than a claim to absolute state-of-the-art (SOTA). Code is publicly available at https://github.com/MinhNguyenDS/Co-NAML-LSTUR .

Minh Hoang Nguyen, Thuat Thien Nguyen, Minh Nhat Ta, Tung Le, Huy Tien Nguyen
ScatterRAG: A Framework for Decentralized Graph Routing in RAG System

Retrieval-Augmented Generation (RAG) systems increasingly leverage structured knowledge graphs to enhance factual accuracy and interpretability. However, scaling such systems introduces challenges in routing queries efficiently across partitioned knowledge sources, particularly under memory and bandwidth constraints. We propose ScatterRAG, a lightweight and scalable routing framework for distributed RAG over partitioned knowledge graphs. ScatterRAG employs Bloom filters for compact indexing and fast negative lookups, combined with fuzzy matching to address lexical variation and noisy queries. This approach enables efficient query filtering and routing without centralized control or exhaustive broadcast. Experimental evaluations on the Natural Questions benchmark show that ScatterRAG achieves acceptable memory usage while maintaining scalability in distributed environments. Although its current prototype yields slower inference and lower retrieval accuracy compared to centralized baselines, ScatterRAG demonstrates a practical balance between scalability and resource efficiency, providing a promising foundation for future research on decentralized and efficient RAG architectures.

Nhat Ho Minh, Long Le Pham Tien, Kien Nguyen Trung, Trong Nhan Phan
A Competition-Based Large Neighborhood Search for Vessel Routing Optimization Deriving for Sustainable Marine Debris Cleanup

The rapid increase in global solid waste generation has intensified marine pollution, as waste flows from land to oceans through rivers and canals. Studies estimate that millions of metric tons of plastic enter the ocean annually, dispersing widely due to ocean currents, tides, and winds. This accumulation on coastlines and the ocean surface poses significant environmental, economic, and health risks. Global initiatives have been introduced to improve waste management and reduce marine litter, but waste collection remains a logistical challenge, particularly in developing countries with limited resources. Optimization models, particularly the Capacitated Vehicle Routing Problem (CVRP), have been widely applied to enhance collection efficiency. Recent research extends these models to marine debris collection, integrating environmental data and predictive models to optimize vessel routing while minimizing costs and emissions. This study advances prior works by proposing a Competition-Based Large Neighborhood Search (CLNS) algorithm for large-scale Marine Debris Collection Problems (MDCP) and a modified version of a state-of-the-art hybrid algorithm from the literature. The effectiveness of our method is validated through comparative evaluations on multiple test cases.

Trinh Duc Minh, Tat-Hien Le
Bridging Usability and User Experience in AI-Based Fall Detection: A Systematic Literature Review on User-Centered Design Approach for Enhanced Adoption

This study explores the role of User-Centered Design (UCD) in improving the usability and adoption of AI-based fall detection systems. Despite advances in artificial intelligence and wearable healthcare technology, usability challenges, including alert fatigue, complex interfaces, and poor customization, its limit system effectiveness and user adoption. This paper systematically reviews the usability challenges, UX solutions, and technology adoption models related to AI-driven fall detection using a PRISMA-based SLR methodology. Thematic analysis of selected studies identifies key usability barriers, UX enhancement strategies, and the role of TAM & UTAUT adoption models in driving system engagement. Findings suggest that multimodal UX, AI explainability, and adaptive alerts significantly improve adoption rates. The study provides design recommendations and future research directions for enhancing AI-driven fall detection through UCD methodologies.

Anwar E. Khidzir, Waidah Ismail, Mahadi Bahari, Ali Y. Aldailamy
Autoencoder-Based Deep Features for Internal Defect Detection in Mangoes Using NIR Spectral Data

Near-Infrared (NIR) spectroscopy offers a powerful non-destructive approach for detecting internal defects in mangoes, such as spongy tissue. In this study, both lower wavelength (LW) and higher wavelength (HW) NIR spectral regions were analyzed. A customized preprocessing pipeline is implemented using techniques such as Savitzky–Golay smoothing, Multiplicative Scatter Correction (MSC), Extended MSC (EMSC), Standard Normal Variate (SNV), and detrending to enhance spectral quality. For dimensionality reduction, a tailored Autoencoder with Multi-Head Attention (AE-MHA) is employed. Extensive experiments were conducted using both AE-MHA and boosting-based classifiers (LightGBM and XGBoost), with all models evaluated using 5-fold cross-validation. The AE-MHA model, particularly on HW data, achieved the highest performance 93.75% overall accuracy and 100% accuracy in identifying defective mangoes. The HW spectral region significantly outperformed LW and also surpassed previous methods reported by Guru and Nandini (2025). This confirms the effectiveness of the proposed non-invasive framework for real-time detection of internal fruit defects, showing strong potential for integration into smart agriculture systems.

D. Nandini, D. S. Guru
Robust 3D Virtual Try-On Under Complex Poses

This paper presents Pose-Robust 3D Virtual Try-On Network (PR-VTON), a novel framework addressing the persistent challenges of achieving realistic 3D virtual try-on under complex human poses. In contrast to prior works that primarily handle frontal or simplified postures, PR-VTON is designed to accommodate extreme variations such as side-facing stances, crossed arms, and severe self-occlusions, conditions that often lead to geometric distortions and inconsistent garment rendering. The proposed approach integrates a personalized diffusion model with a pose-aware 3D Gaussian Splatting editing pipeline, enabling fine-grained garment transfer while preserving high-fidelity geometry and texture across multiple viewpoints. To support training and evaluation, a curated and pre-processed dataset named PR-VTON3D is introduced, containing diverse clothing types and challenging poses that offer realistic scenarios for robust model development. Through a reference-driven multi-view editing strategy and a multi-level attention fusion mechanism, PR-VTON achieves superior cross-view consistency, garment similarity, and visual realism compared to state-of-the-art baselines. Experimental results and user studies demonstrate that the proposed framework significantly enhances the reliability of 3D virtual try-on systems in real-world conditions, establishing a new benchmark for pose-invariant garment transfer. Code is available at: https://github.com/nguyendinhhieu1309/PR-VTON .

Nguyen Dinh Hieu, Pham Thi Van Anh, Do Ngoc Bich, Tran Tien Long, Pham Hong Phong
Community Detection in Complex Overlapping Networks Using Graph Autoencoders with Semi-supervised Fuzzy Clustering

Detecting overlapping communities is a fundamental challenge in network analysis. Communities represent groups of nodes that are more densely connected internally than to the rest of the network, often corresponding to functional modules or social groups in real-world systems. This paper proposes a hybrid framework called GAE.SC that combines a Graph Autoencoder (GAE) with a semi-supervised fuzzy clustering algorithm to address this task. Our model first employs a GAE, trained directly on the input network, to learn low-dimensional, structure-preserving node embeddings from both graph topology and node attributes. These embeddings subsequently serve as input for a semi-supervised clustering algorithm, which incorporates supervision constraints to enhance the accuracy and interpretability of the community detection process. Experimental results on five benchmark datasets demonstrate that our method consistently outperforms GCNFCM, a validated and robust baseline for overlapping community detection in complex networks.

Nguyen Hai Yen, Vo Duc Quang, Tran Dinh Khang, Phan Anh Phong
Graph-Attention Policy Gradient Framework for Adaptive Traffic Signal Control in Complex Urban Networks

Traffic congestion remains a critical challenge for modern cities, causing economic loss, environmental damage, and reduced quality of life. Traditional signal control methods such as fixed-time and actuated systems cannot adapt to highly dynamic conditions, particularly in Hanoi, Vietnam, where irregular road layouts, non-lane-based driving, and motorcycle dominance create extreme variability. We propose GATLIGHT (Graph-Attention Traffic Light Control), a deep reinforcement learning framework that combines policy gradient optimization with a graph attention network to achieve decentralized yet coordinated signal control. Each intersection is modeled as an intelligent agent that exchanges attention-weighted information with its neighbors, enabling adaptive and network-aware decisions. The state representation encodes normalized vehicle counts and signal timing, while the reward minimizes the standard deviation of traffic distribution to balance flows and reduce bottlenecks. GATLIGHT is trained and evaluated in SUMO using both synthetic networks and real-world datasets from New York, Hangzhou, Jinan, and Hanoi. Compared to state-of-the-art methods such as PressLight and CoLight, GATLIGHT achieves up to 26.7% lower travel time and consistently reduces waiting time and queue length under real-world Hanoi traffic. These results demonstrate that graph-based reinforcement learning provides a scalable and robust solution for adaptive traffic management in some of the most complex and volatile urban environments. Code is available at: https://github.com/nvanhieu25/tsinghuaRL .

Nguyen Van Hieu, Nguyen Phuong Linh, Do Ngoc Bich, Nguyen Dinh Hieu
CLIP-AMR-GPT: Enhancing Image Captioning via Cross-Modal Semantics Fusion and GPT-Based Re-ranking

Current image captioning models still face several critical challenges: insufficient exploitation of semantic knowledge, the absence of effective mechanisms to integrate heterogeneous feature sources, and limitations in generating natural and coherent output captions. To address these issues, this paper proposes a novel image captioning model, CLIP-AMR-GPT, built upon an encoder–decoder architecture that integrates multi-source knowledge. Specifically, the encoder combines vision–language features extracted from the CLIP model, relational graph embeddings representing semantic relationships among objects, and Abstract Meaning Representation (AMR) embeddings derived from ground-truth captions. The decoder employs an adaptive attention mechanism to dynamically regulate the influence of AMR-like graph embeddings at each word generation step, thereby enabling flexible exploitation of semantic structural information. Furthermore, a GPT-2-based re-ranking module is incorporated to evaluate and select captions with the highest linguistic likelihood, enhancing fluency and coherence. Experimental evaluations on the MS COCO benchmark dataset demonstrate that the proposed model outperforms state-of-the-art methods, validating the effectiveness of integrating visual, semantic, and linguistic knowledge into image captioning models.

Nguyen Van Thinh, Tran Van Lang, Nguyen Minh Hai
IUFlowGen: An AI System for Converting Procedural Texts into Flowcharts

Procedural documents are common in domains such as technical operations and legal compliance, yet their unstructured and complex logic often hinders comprehension. Converting such texts into flow-charts improves clarity and reduces cognitive load, but existing AI systems typically fall short: they produce static outputs without clarification, fail to adapt to varying document complexity, and lack a principled way to determine when automation is genuinely helpful. To address these challenges, we present IUFlowGen, an AI-assisted, human-in-the-loop system that combines retrieval-augmented generation, LLM reasoning, and graph-based modeling to generate structured and interactive flowcharts. We also introduce a ten-factor rubric to quantify procedural document complexity, enabling adaptive support based on document difficulty. Experiments on 30 documents of varying complexity show that IUFlowGen achieves high accuracy and completeness across all ten factors, demonstrating broader coverage than prior systems. By integrating interactivity, complexity assessment, and clarification, IUFlowGen provides a practical and effective solution for improving user comprehension of complex procedural content.

Nhat-Khiem Nguyen, Thanh-Tung Tran
Compact Yet Powerful: Group Query Attention in TinyViT Student Models for Efficient Classifications

Vision Transformers (ViTs) perform well in computer vision but need high computational power. This paper explores Grouped Query Attention (GQA) to reduce parameters in TinyViT models with competitive accuracy. We compare GQA settings with query-to-key ratios of 2:1 to 10:1 on CIFAR-10 and CIFAR-100 benchmarks. Findings show that GQA decreases attention parameters by a maximum of 42.8% with no loss of accuracy. The GQA 5:1 setup reaches 89.95% accuracy in CIFAR-10, surpassing the usual MHA’s 89.37% while shrinking attention parameters by 39.9%. These results prove GQA’s viability for deploying effective vision transformers within resource-limited environments.

Nishitha Anand, Rachit Verma, Bhaskarjyoti Das
Potato Leaf Disease Classification in Uncontrolled Environments: Leveraging the Synergy of Handcrafted Features

Accurate classification of potato leaf diseases under uncontrolled conditions is challenging due to lighting variability, background clutter, and class imbalance. This study presents a lightweight and interpretable machine learning pipeline based on handcrafted feature engineering, specifically designed to address the challenges of uncontrolled imaging conditions in potato leaf disease classification. We systematically extract diverse feature types - including color statistics, color histograms, and BoVW-based texture descriptors (SIFT, KAZE) - and integrate them to form a comprehensive representation. To improve minority class recognition, Borderline-SMOTE is applied during training. Experimental results on a real-world potato leaf disease dataset demonstrate that the proposed approach achieves 81.03% accuracy using the LightGBM classifier, outperforming several state-of-the-art deep learning models. These results highlight the effectiveness of carefully engineered features and their integration for real-world plant disease classification.

Phi-Hung Hoang, Thi-Thu-Hong Phan
Telecom Revenue Prediction over Time Series with Pre-trained Language Models

Time series forecasting plays a critical role in many domains, yet existing methods often demand domain-specific expertise and large volumes of historical data. In this work, we introduce TELE-PLM, a novel framework that adapts pre-trained language models (PLMs) for time series forecasting, with a particular focus on telecommunication revenue prediction. Our approach transforms raw time series into text-based prototype representations and incorporates a Prompt-as-Prefix mechanism to align continuous temporal signals with the discrete token space of PLMs. We construct and release a clean, structured dataset derived from MobiFone’s transaction records, enabling reproducible research in telecom forecasting. Extensive experiments compare TELE-PLM using BERT, GPT-2, and T5 against strong baselines such as TimesNet and TimeMixer. Results demonstrate that TELE-PLM significantly outperforms traditional time series models, with T5 achieving the lowest forecasting error (MASE = 2.6846), highlighting the advantage of leveraging text-to-text architectures for temporal prediction. These findings underscore the potential of PLMs as generalizable and data-efficient forecasters. Future work will explore multi-step forecasting, cross-domain adaptation, and hybrid architectures that integrate PLMs with temporal inductive biases for enhanced performance.

Phuoc Lien, Nam Thoai, Vinh Phan
Domain Adaptation of Federated Learning by Data Generation and Server Feedback

Federated learning (FL) enables collaborative model training across distributed clients without requiring centralized access to local data. However, it faces significant challenges under domain shift, where discrepancies between client data distributions and deployment environments might degrade model performance. In this paper, we introduce a novel feedback-driven framework for enhancing domain adaptation in federated settings. Our approach incorporates server-side domain analysis to detect distributional shifts during validation and generates lightweight feedback signals such as Gradient-weighted Class Activation Mapping (Grad-CAM) and SHapley Additive exPlanations (SHAP) to highlight domain-specific patterns. These signals are transmitted to clients as adaptation cues, enabling more targeted local training. In addition, the framework supports client-side data generation using modality-appropriate generative models, including variational autoencoders for tabular data, diffusion models for images, and language models for text. This data generation further mitigates heterogeneity and strengthens model generalization. By combining server-guided feedback with client-side adaptation and data augmentation, our method significantly improves performance under domain shift. Extensive experiments demonstrate consistent gains over standard FL baselines across diverse tasks and data modalities.

Phuong-Anh Vu, Kim-Tinh Phan, Cao-Dien Nguyen, Tien-Dung Cao, Le Trieu Phong, Ngoc-Thai Nguyen
Causal Temporal Transformer: An Integrated Framework for Temporal Causal Discovery and Multi-target Prediction

This paper introduces the Causal Temporal Transformer (CTT), an end-to-end trainable transformer-based model designed for the dual tasks of causal time-series discovery and multi-target prediction. At its core, CTT employs a causal attention mechanism to distinguish and emphasize causal components while suppressing non-causal ones, thereby improving interpretability and feature relevance. Furthermore, to capture temporal dependencies with greater precision, the Lag Temporal (LT) embedding is incorporated, which facilitates the analysis of lagging characteristics intrinsic to the data. Notably, CTT can generate a causal graph that elucidates the potential causal relationships among the targeted variables and the prediction sequence, thereby enriching the interpretability of the model’s outputs. Experiments on Netsim, Kuramoto, and Lorenz-96 benchmarks demonstrate CTT’s superior causal time-series discovery capabilities, outperforming recent methods. Evaluations on ETH-UCY, nuScenes, and TrajAir datasets further show significant gains in multi-target prediction accuracy over baselines. Overall, CTT offers a robust, interpretable framework aligning inferred causal attributions with underlying drivers of multi-target time-series data.

Quang Le, Sepideh Mousazadeh, Francois Chan, Claude D’Amours, Il-Min Kim
When Self-supervised Transformers Meet Knowledge Distillation: Efficient Chest X-Ray Classification

Chest X-ray classification is limited by scarce annotations and the heavy cost of transformer models. We propose a three-stage framework: (1) self-supervised pretraining of a Vision Transformer (ViT) with DINOv2 on 880k unlabeled radiographs, (2) fine-tuning on ChestX-ray14, and (3) knowledge transfer into MobileViT using a combination of Binary Cross-Entropy (BCE) and Multi-Label Distillation (MLD) loss. The distilled MobileViT achieves a mean AUROC of 0.8404, surpassing its supervised counterpart by 1.9% while requiring only 0.31 GFLOPs. These results highlight the benefit of domain-specific self-supervised pretraining and demonstrate that BCE and MLD effectively transfers knowledge from large ViTs into compact models. The outcome is a lightweight yet accurate system that can be applied in clinical and resource-constrained healthcare environments.

Quoc-Khang Tran, Nguyen-Khang Pham
ViPPS: Building a Multimodal Dataset for Physics Problem Solving in Vietnamese

Recent advances in multimodal large models, particularly Vision-Language models (VLMs), have demonstrated strong capabilities in handling reasoning tasks that involve both textual and visual information. However, in low-resource languages such as Vietnamese, research in educational contexts remains limited due to the absence of suitable multimodal datasets. To address this gap, we introduce ViPPS (Vietnamese Physics Problem Solving), the first multimodal dataset designed for physics problem solving in Vietnamese. ViPPS consists of nearly 500 physics-related images (e.g., circuit diagrams, mechanics illustrations, experimental setups, and graphs) paired with more than 5,500 corresponding textual questions collected from Vietnamese educational Q&A platforms. Each entry reflects authentic student learning scenarios in secondary and high school physics. We detail the dataset creation pipeline, cleaning procedures, and provide comprehensive statistical analyses of linguistic and visual characteristics. To benchmark ViPPS, we fine-tune recent VLM baselines under a supervised multimodal setting, and report their performance across different physics domains. Results highlight both the potential of current VLMs and their limitations in solving domain-specific multimodal reasoning problems in Vietnamese. We expect ViPPS to serve as a valuable resource for advancing research on multimodal learning, physics education, and AI for low-resource languages.

Quynh T. N. Vo, Xinh T. Le, Thao H. M. Tran, Tho T. Quan
Sparsity Driven Multi-Agent Reinforcement Learning for Urban Traffic Signal Optimization

Urban traffic congestion remains a critical challenge, often worsened by inefficient and frequent signal changes at intersections. Traditional reinforcement learning (RL) methods optimize for cumulative reward but neglect the cost of excessive actions, leading to unstable signal control. This is attributed to the distributed nature of the computational problem associated. The vehicular arrival pattern varies from junction to junction complicating the smooth transit at each junction without encountering a red signal. These patterns are observed by Reinforcement Learning agents, and they coordinate with each other. Overall, we propose a Greedy Action-Minimized Reinforcement Learning (GAM-RL) framework that integrates a sparsity-inducing penalty inspired by Basis Pursuit into the Q-learning objective. Each traffic intersection is modelled as an independent RL agent that observes lane-level congestion and learns when to change or retain signal phases. The Bellman update is modified to include an ℓ₁-regularized term, discouraging unnecessary phase switches and promoting control stability. Experiments conducted over 300 episodes demonstrate that GAM-RL achieves higher average cumulative rewards than standard RL while reducing the number of signal changes per episode by a significant margin. The results confirm that GAM-RL balances flow efficiency and actuation cost, making it suitable for real-world traffic systems where stability and fairness are as important as throughput.

Raghava Morusupalli, D. Teja Santosh, Jyothirmai Joshi
A Fast AI-Powered Algorithm for Robotic Wheelchair Navigation in Obstacle-Rich Environments

A robotic wheelchair can be considered a primary means of transportation for people with limited mobility, allowing them to reach their destination safely and quickly. Using an effective pathfinding algorithm in combination with artificial intelligence-based devices installed on a wheelchair can significantly improve navigation. The purpose of this paper is twofold: (a) to develop an effective pathfinding algorithm, called D-PRM, that can be applied in obstacle-rich environment and (b) to create a mathematical decision-support tool for selecting AI-based devices to be installed on a wheelchair. To achieve these goals, we construct an efficient pathfinding algorithm that combines the strengths of the Probabilistic Roadmap (PRM) and A* methods and formulate a mixed integer mathematical programming model for AI-based device selection. Extensive computational experiments validate the performance of the proposed algorithm.

Shimon Aviram, Eugene Levner, Dmitry Tsadikovich
MHEIClosed: An Efficient Algorithm for Mining Closed High-Efficiency Itemsets

High-Efficiency Itemset Mining (HEIM) is an emerging extension of utility-based mining that considers utility and investment, addressing key limitations of High Utility Itemset Mining (HUIM). However, existing HEIM methods often produce large and redundant result sets, complicating interpretation and post-analysis. To address this, we propose Closed High-Efficiency Itemset Mining (CHEIM), which focuses on extracting concise and non-redundant patterns while preserving essential utility and investment information. This paper presents MHEIClosed, a novel algorithm for efficiently mining closed high-efficiency itemsets. MHEIClosed incorporates advanced techniques from closed itemset and HEIM approaches, including pruning strategies based on tighter upper bounds (ssef and slef), transaction merging, and efficient closure checking. Extensive experiments on several real-world datasets demonstrate that MHEIClosed significantly outperforms the state-of-the-art HEPMClosed algorithm in terms of runtime and memory consumption, particularly on dense or large-scale datasets. These results validate MHEIClosed’s scalability and effectiveness in mining meaningful patterns, making it well-suited for practical data mining applications.

Linh T. T. Nguyen, Thang G. Phung, Vinh Q. Pham, Loan T. T. Nguyen
A Two-Stage Fuzzy-Guided Genetic Algorithm for University Timetabling with LLM-Based Preference Parsing

University course timetabling is a well-known NP-hard combinatorial optimisation problem that involves numerous hard constraints and softer, lecturer-specific preferences. In this study, we propose a two-stage fuzzy-guided genetic algorithm for generating high-quality university schedules, addressing both lecture and laboratory sessions while maintaining their logical dependencies. In the first stage, we construct lecture timetables by enforcing all hard constraints during encoding, while the fitness function evaluates the degree of soft constraint violations, particularly vague lecturer time preferences. These preferences, initially expressed in natural language, are interpreted using a large language model (LLM) and transformed into fuzzy satisfaction scores over time slots. In the second stage, laboratory sessions are scheduled based on the previously generated lecture timetable, ensuring alignment in timing and avoiding conflicts with lecture sessions. We evaluate our method using real data from the Faculty of Computer Science and Engineering at Ho Chi Minh City University of Technology (CSE@HCMUT). The results show that our method consistently produces conflict-free schedules while aligning closely with lecturer preferences, demonstrating the effectiveness of combining LLM-based interpretation and fuzzy evaluation within a genetic optimization framework.

Tam M. Nguyen, Tung T. Nguyen, Tho T. Quan
Spectral Curvature Signature: A Frequency-Domain Descriptor for Driving Behavior Classification

Understanding and classifying vehicle driving behavior is critical for the development of intelligent transportation systems (ITS) and road safety analytics. In this paper, we propose a novel frequency-domain descriptor called the Spectral Curvature Signature (SCS), which characterizes the temporal dynamics of vehicle trajectory curvature through spectral and fractal features. The proposed framework processes raw (x, y) trajectories using a Savitzky–Golay filter, computes curvature over time, and applies a Fourier Transform to extract a set of compact, interpretable descriptors, including spectral energy, centroid frequency, Hurst exponent, and Band Energy Ratios (BER). We evaluate the effectiveness of SCS on both synthetic trajectories generated from a kinematic model and real-world data from the US Highway 101 Dataset (NGSIM). After applying random undersampling to address label imbalance, the classification model achieves an accuracy of 99.8%, precision of 99.9%, recall of 99.7%, and an F1-score of 99.8% in distinguishing between safe and unsafe driving behaviors using a Random Forest classifier. Feature analysis reveals spectral energy as the most discriminative metric, while the Hurst exponent offers limited utility due to the short duration of observed sequences. The method also demonstrates robustness to data noise and class imbalance. Overall, SCS provides a principled, interpretable, and low-dimensional alternative to traditional time-domain motion descriptors, with strong potential for integration into large-scale traffic monitoring and driver behavior analytics systems.

Do Thanh Thai, Quang Tran Minh
Adaptive Urban Traffic Signal Control via Lyapunov Optimization, LSTM-Based Forecasting, and Shock-Aware Phase Re-Service

This paper presents a novel traffic signal control methodology that combines Lyapunov Drift-Plus-Penalty (DPP) optimization with a shock-aware phase re-service mechanism and LSTM-based traffic forecasting to achieve both queue stability and delay minimization at urban intersections. The proposed controller dynamically reallocates green time in response to evolving queue lengths, particularly prioritizing oversaturated approaches to prevent congestion build-up and mitigate traffic shockwaves. LSTM networks are employed to capture real-time temporal patterns in traffic flow, enabling more responsive and data-driven decision-making. The framework is grounded in queueing theory and incorporates Kingman’s delay approximation to model system dynamics, while theoretical analysis ensures strong stability and provable near-optimal delay under admissible traffic demand. Extensive simulations were conducted across two scenarios. In the first, the proposed method was compared with traditional fixed-time control, achieving a 17.21% reduction in average waiting time, a 6.97% decrease in time loss, and a 2.35% reduction in average trip duration, with fewer teleport events and improved simulation efficiency. In the second scenario, compared to a baseline without shock-awareness, the proposed method reduced the average queue length by approximately 93.9% (from 263.54 to 16.09 vehicles) and the total queue delay by 93.9% (from 1054.17 to 64.38 vehicle $$\cdot $$ · slots), with 2,064 targeted shock re-allocations contributing to improved traffic stability. These results confirm the method’s effectiveness in both proactive congestion management and reactive traffic surge mitigation. This work contributes a theoretically grounded and practically viable adaptive signal control strategy, enhanced by real-time sequence modeling, with strong potential for scalable deployment in smart urban traffic systems.

Do Thanh Thai, Quang Tran Minh
Revolutionizing Precise Low Back Pain Diagnosis via Contrastive Learning

Low back pain affects millions worldwide, driving the need for robust diagnostic models that can jointly analyze complex medical images and accompanying text reports. We present $$\texttt{LumbarCLIP}$$ LumbarCLIP , a novel multimodal framework that leverages contrastive language-image pretraining to align lumbar spine MRI scans with corresponding radiological descriptions. Built upon a curated dataset containing axial MRI views paired with expert-written reports, $$\texttt{LumbarCLIP}$$ LumbarCLIP integrates vision encoders (ResNet-50, Vision Transformer, Swin Transformer) with a BERT-based text encoder to extract dense representations. These are projected into a shared embedding space via learnable projection heads - configurable as linear or non-linear—and normalized to facilitate stable contrastive training using a soft CLIP loss. Our model achieves state-of-the-art performance on downstream classification, reaching up to $$95.00\%$$ 95.00 % accuracy and $$94.75\%$$ 94.75 % F1-score on the test set, despite inherent class imbalance. Extensive ablation studies demonstrate that linear projection heads yield more effective cross-modal alignment than non-linear variants. $$\texttt{LumbarCLIP}$$ LumbarCLIP offers a promising foundation for automated musculoskeletal diagnosis and clinical decision support.

Thanh Binh Le, Hoang Nhat Khang Vo, Tan Ha Mai, Trong Nhan Phan
Toward Sustainable and Cost-Efficient HPC Systems: A Deep Reinforcement Learning Job Scheduling Approach

Hybrid-powered High Performance Computing (HPC) centers can reduce both operational costs and carbon footprints by aligning job execution with renewable supply and time-varying electricity prices. However, existing Deep Reinforcement Learning (DRL) schedulers often neglect price signals, rely on synthetic energy traces, or conflate job selection and delay, limiting their real-world applicability. This work introduces a cost-aware, multi-action DRL framework that embeds a hybrid energy cost model into an HPC scheduling environment, enabling agents to incorporate dynamic price signals and renewable forecasts while decoupling job selection from execution delay. Evaluated on production-scale workload traces and measured renewable profiles, the proposed scheduler consistently reduces energy costs, improves renewable utilization, and maintains competitive job responsiveness compared with heuristic and learning-based baselines. These results demonstrate a practical path toward economically and environmentally sustainable HPC operations.

Thanh Hoang Le Hai, Nhan Nguyen Phuc, Mai Nguyen Tran Phuong, Minh Bui Ngoc, Nam Thoai
Planning to Prove: Improving Informal Proofs of Olympiad Inequalities from Large Language Models

Recent advancements in Large Language Models (LLMs) have enabled them to perform exceptionally well on diverse natural language processing and reasoning challenges, with emergent abilities that increasingly mimic human cognition. While LLMs show promise in general mathematical problem-solving, their ability to generate informal, step-by-step proofs for Olympiad-level theorems that are both logically sound and human-verifiable remains underexplored. Existing research has primarily focused on formal theorem proving, which depends on strict symbolic systems and external verifiers, often neglecting the nuanced demands of informal mathematical reasoning. In this paper, we introduce Planning to Prove, a prompt-based framework designed to guide LLMs in producing coherent, step-by-step informal proofs for Olympiad-level inequalities. This approach leverages the inherent capacity of LLMs for multi-step reasoning and planning, enabling the model to first construct a structured outline of the proof before generating the detailed argument. The proposed method is evaluated on a curated set of inequality problems from Mathematical Olympiads and demonstrates that it improves both the correctness and interpretability of the generated proofs. The results highlight the critical role of structured reasoning in informal mathematical domains and suggest promising directions for enhancing LLM performance in advanced mathematical reasoning tasks.

Le Van Thanh, Do Xuan Trong, Pham Duc Tinh, Hai Van Pham
REBot: From RAG to CatRAG with Semantic Enrichment and Graph Routing

Academic regulation advising is vital for helping students interpret and comply with institutional policies, yet building effective systems requires domain-specific regulatory resources. To address this challenge, we propose REBot, an LLM-enhanced advisory chatbot powered by CatRAG, a hybrid retrieval–reasoning framework that integrates RAG with GraphRAG. We introduce CatRAG that unifies dense retrieval and graph-based reasoning, supported by a hierarchical, category-labeled knowledge graph enriched with semantic features for domain alignment. A lightweight intent classifier routes queries to the appropriate retrieval modules, ensuring both factual accuracy and contextual depth. We construct a regulation-specific dataset and assess REBot on classification and question-answering tasks, achieving state-of-the-art performance with an F1-score of 98.89%. Finally, we implement a web application that demonstrates the practical value of REBot in real-world academic advising scenarios.

Thanh Ma, Tri-Tam La, Lam-Thu Le Huu, Minh-Nghi Nguyen, Khanh-Van Pham Luu
Global Positional Encoding and Its Application in Medical Image Segmentation

Transformer-based models have become increasingly popular for medical image segmentation. While they incorporate positional encoding to model spatial structure, this encoding only captures patch positions within a cropped 3D subvolume, not within the full anatomical scan. As a result, important global spatial context–such as organ location priors–is lost. This limitation is particularly harmful in medical scenarios where multiple organs may share similar intensity profiles but differ in anatomical position.In this work, we propose a lightweight Global Positional Encoding (GPE) module that injects absolute 3D spatial coordinates into transformer-based segmentation networks. GPE recovers lost anatomical information and enhances spatial awareness without significant overhead. We integrate GPE into four representative models–UNETR, Swin-UNETR, nnFormer, and UNETR++–and evaluate on the Synapse multi-organ CT dataset. Results show consistent performance gains across all models, with up to 1.66% improvement in Dice score and substantial reduction in HD95.These findings demonstrate that GPE effectively bridges the gap between local processing and global spatial reasoning, offering a simple yet powerful enhancement for medical segmentation networks.

Minh-Quy Le, Thanh-Sach Le
Integrating Graph Convolutional Networks and Clustering for Intelligent Recommendation System

In the era of information explosion, personalized recommendation systems have become indispensable tools for filtering relevant content for users. However, their performance is limited by challenges including the cold-start problem, sparse data, and difficulty in capturing complex user-item relationships. This research proposes HybridGCN-Ext, a novel deep learning approach that combines Graph Convolutional Networks (GCNs) with knowledge graphs and clustering techniques to address these limitations. Unlike traditional methods relying solely on collaborative or content-based filtering, our model leverages three distinct information sources: (1) user-item interaction patterns through a simplified LightGCN architecture, (2) semantic relationships between items through Knowledge Graph Convolutional Networks (KGCN), and (3) cluster-based information through attention mechanisms to enhance item representations. Experimental results demonstrate significant improvements across recommendation quality metrics. Our findings contribute to the recommendation systems field by demonstrating how structural knowledge and clustering can be effectively combined with graph neural networks to generate more accurate, diverse, and interpretable recommendations.

Thanh-Tung Dang, Thanh-Van Le
Backmatter
Titel
Multi-disciplinary Trends in Artificial Intelligence
Herausgegeben von
Thanh Tho Quan
Chattrakul Sombattheera
Hoang-Anh Pham
Ngoc Thinh Tran
Copyright-Jahr
2026
Verlag
Springer Nature Singapore
Electronic ISBN
978-981-9549-60-3
Print ISBN
978-981-9549-59-7
DOI
https://doi.org/10.1007/978-981-95-4960-3

Die PDF-Dateien dieses Buches wurden gemäß dem PDF/UA-1-Standard erstellt, um die Barrierefreiheit zu verbessern. Dazu gehören Bildschirmlesegeräte, beschriebene nicht-textuelle Inhalte (Bilder, Grafiken), Lesezeichen für eine einfache Navigation, tastaturfreundliche Links und Formulare sowie durchsuchbarer und auswählbarer Text. Wir sind uns der Bedeutung von Barrierefreiheit bewusst und freuen uns über Anfragen zur Barrierefreiheit unserer Produkte. Bei Fragen oder Bedarf an Barrierefreiheit kontaktieren Sie uns bitte unter accessibilitysupport@springernature.com.

    Bildnachweise
    AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, NTT Data/© NTT Data, Wildix/© Wildix, arvato Systems GmbH/© arvato Systems GmbH, Ninox Software GmbH/© Ninox Software GmbH, Nagarro GmbH/© Nagarro GmbH, GWS mbH/© GWS mbH, CELONIS Labs GmbH, USU GmbH/© USU GmbH, G Data CyberDefense/© G Data CyberDefense, Vendosoft/© Vendosoft, Kumavision/© Kumavision, Noriis Network AG/© Noriis Network AG, WSW Software GmbH/© WSW Software GmbH, tts GmbH/© tts GmbH, Asseco Solutions AG/© Asseco Solutions AG, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, Ferrari electronic AG/© Ferrari electronic AG