Multi-disciplinary Trends in Artificial Intelligence
18th International Conference, MIWAI 2025, Ho Chi Minh City, Vietnam, December 3–5, 2025, Proceedings, Part II
- 2026
- Book
- Editors
- Thanh Tho Quan
- Chattrakul Sombattheera
- Hoang-Anh Pham
- Ngoc Thinh Tran
- Book Series
- Lecture Notes in Computer Science
- Publisher
- Springer Nature Singapore
About this book
This 3-volume set constitutes the proceedings of 18th International Conference on Multi-disciplinary Trends in Artificial Intelligence, MIWAI 2025, held in Ho Chi Minh City, Vietnam, during December 3–5, 2025.
The 110 full papers presented in these proceedings were carefully reviewed and selected from 306 submissions. The papers focus on various topics in AI and its applications, such as deep learning, machine learning, computer vision, pattern recognition, and natural language processing.
Table of Contents
-
Frontmatter
-
Solving the Student-Project Allocation Problem with Preferences over Projects Using a Multi-start Local Search
Le Quoc Anh, Nguyen Nhu Son, Son Thanh Cao, Hoang Huu VietAbstractThe Student-Project Allocation problem with Preferences over Projects is a well-studied many-to-one stable matching problem with practical relevance in academic assignment systems. Finding a maximum stable matching that assigns the largest possible number of students to their acceptable projects is known to be NP-hard. In this paper, we introduce a multi-start local search algorithm to tackle this challenge, where each local search begins with a random matching and iteratively eliminates undominated blocking pairs to improve its stability. When a local search reaches a stable but incomplete matching, it applies a perturbation strategy to reassign unallocated students based on their preferences, thereby encouraging further improvements. The algorithm repeats this process across multiple local searches and returns the largest stable matching found. Experimental results demonstrate that our approach efficiently achieves large stable matchings within a reasonable computational time, even for large-scale instances. -
A Small Language Model and Domain-Specific Resources for Vietnamese Public Services
Van Thai Le, Anh-Cuong LeAbstractRecent advancements in large language models (LLMs) have enabled impressive capabilities across general-purpose tasks. However, applying these models to low-resource domains such as public administrative services in Vietnam remains a challenge due to data scarcity, domain complexity, and computational constraints. To address this, we present VietPSLM (Vietnamese Public Service Language Model), a compact, instruction-following language model fine-tuned for Vietnamese public service question answering. Our approach includes domain-adaptive pretraining, supervised fine-tuning, and a two-pass inference strategy to improve clarity and factual accuracy. Alongside VietPSLM, we release a suite of domain-specific datasets, including an unlabeled corpus for pretraining, a factual QA dataset, and two evaluation benchmarks. Despite its small size, VietPSLM delivers competitive performance, approaching the accuracy of larger proprietary systems such as Gemini 2.0 Flash. These results highlight that targeted adaptation and high-quality data can enable lightweight models to perform effectively in real-world governmental settings. -
Multimodal Tree Crown Detection and Carbon Stock Estimation from Remote Sensing Imagery
Manh-Tan Doan, Quoc-Ngoc LyAbstractAccurate estimation of forest carbon stocks from remote-sensing imagery is critical for climate monitoring and ecosystem management. We propose HAF R-CNN (Height-perceptual Attention Fusion R-CNN), a Faster R-CNN extension that fuses RGB (red-green-blue) and canopy height model (CHM) features via multi-level cross-attention to improve tree crown detection. Unlike previously proposed RGB-CHM fusion approaches that rely mainly on simple concatenation or late fusion, our cross-attention design enables richer bidirectional structural-spectral interactions. We further develop a tree-level carbon estimation pipeline combining crown structural descriptors and vegetation indices with a Random Forest regressor. On the NEON dataset, HAF R-CNN outperforms baselines in detection, and the pipeline achieves reliable carbon estimation at the OSBS site (\(R^2\) = 0.67, MAE = 59.6 kg), though performance decreases in denser forests (MLBS). These results highlight both the promise and current limitations of multimodal detection for scalable carbon monitoring. -
A Classificatory Topos: Refining Evolving Knowledge in Multi-agent Learning Systems
Manuel Hernández, Eduardo Sánchez-Soto, C.H. Castañeda-RoldánAbstractWe propose a Classificatory Topos, a mathematical framework to model the dynamic evolution of knowledge within a finite system of interacting learning machines. Following guidelines of category theory, the construction establishes a Grothendieck topos, \(\text {Sh}(C_{\text {learn}}, J)\), as a mathematical universe for this problem domain. By defining a base site on a category of epistemic states with causal morphisms, and equipping it with a Grothendieck topology that formalizes a logic of justification, the framework provides a rich, non-linear model of system evolution. The use of sheaves ensures causal consistency, while the internal logic of the topos, governed by a subobject classifier, provides the machinery to trace, verify, and explain the refinement of classifications. -
A Survey of Algorithmic and Contextual Decomposition Methods Across Language Model Pipelines
Mengyao Zhu, Phuc Huu NguyenAbstractAs language models continue to scale in parameter count and reasoning complexity, achieving efficient adaptation and interpretable inference has become a pressing challenge. Decomposition offers a versatile paradigm that addresses both concerns, enabling structural reduction during model training and logical segmentation during inference. This survey presents a comprehensive analysis of 29 decomposition methods categorized into algorithmic and contextual types. We examine how these methods restructure weight matrices, control update granularity, and decompose information content to support downstream tasks. Through extensive interpretations for each of these methods, we highlight the growing role of decomposition as both a mathematical and reasoning tool. Our findings offer a structured reference for advancing efficiency, modularity, and interpretability in large language models, with implications for research in model training, algorithm optimization, and knowledge-grounded inference. This work targets researchers and practitioners seeking scalable solutions across NLP domains where resource constraints and reasoning depth demand more than traditional modeling and inference pipelines. -
Uncertainty Quantification for Flood Forecasting in Small Catchments
Michel Spils, Sven TomfordeAbstractThe quantification of uncertainty in flood forecasting, especially in small catchments, presents significant challenges due to the inherent uncertainty in precipitation now- and forecasts. Several authors have investigated uncertainty quantification for time series forecasting in general and for flood forecasting specifically. To our knowledge, none have focused on small catchments, tidal influences, large hourly forecast horizons, or the incorporation of classic forecasting tools such as differencing. We implement and evaluate approaches using Conformal Prediction, Monte Carlo Dropout, Ensembles and direct distribution forecasting to quantify the uncertainty of LSTM networks trained to predict the change in water level for the next 48 h in three catchments in Northern Germany. We analyze the performance of the different approaches regarding the width and accuracy of the prediction intervals. Our study shows that the incorporation of differencing strongly influences which uncertainty quantification methods are suitable, with direct distribution forecasting ignoring correlations between the forecasting steps and Conformal Prediction being the most suitable for our specific datasets. -
Efficient Learning of Horn Formulas over Finite Totally Ordered Domains
Miki Hermann, Gernot SalzerAbstractOne of the core activities in learning is the synthesis of concepts and their relationships by generalizing from positive and negative samples. Information hidden in data becomes explicit, relations emerge that provide insights and serve as explanations. Nowadays, machine learning is able to process large quantities of data and to build models that classify new data with such a success that the algorithms are termed ‘intelligent’. With most approaches, though, the models are black boxes: It is usually difficult to explain why a model classifies data in a particular way.Our work is an approach to machine learning that provides explanations. We present algorithms that construct if-then rules (Horn clauses) from samples and counter-samples. We generalize results for binary data to finite totally-ordered domains by relying on literals of the form \(x\ge d\) and \(x\le d\), with x and d being domain variables and constants, respectively. This way we avoid the need to binarize data over such domains, which usually entails an increase in variables and output that is hard to interpret. We present both an offline and an online algorithm. In the first case, the positive and negative samples are known from the outset, while in the second case the samples arrive one by one and lead to incremental changes to the formula.Both algorithms are linear in the number of positive and negative samples as well as in the number of variables, while the size of the resulting formula does not depend on the number of positive samples. Besides analyzing the asymptotic complexity, we use a C++ implementation to evaluate the algorithms on some datasets. We conclude with a discussion of various extensions. -
Knowledge-Enhanced Vietnamese Paraphrase Identification
Minh Lu Xuan, Minh Nguyen Hong, Loc Nguyen Xuan, Thai Do Thanh, Duc Bui Tien, Quang Tran MinhAbstractParaphrase identification (PI) is a fundamental task in natural language processing (NLP) that determines whether a pair of sentences convey the same meaning. This task plays a crucial role in various applications such as machine translation, computer-assisted translation, and question answering. While extensive research has been conducted in English and several other languages, Vietnamese PI remains relatively underexplored. Pretrained language models (PLMs) have become the standard approach for tackling language understanding tasks, including PI. However, despite their rapid advancement, these models are still limited in their capacity to capture external knowledge. In this study, we propose a novel architecture that integrates PLMs with external knowledge for Vietnamese PI. Experimental results show that our approach, using mBERT as a base model, achieved an F1-score of 95.59% on a combined corpus consisting of vnPara and an additional 1498 sentence pairs enriched with diverse entities. This highlights the effectiveness of our approach in distinguishing named entities and understanding external knowledge. -
Co-NAML-LSTUR: A Combined Model with Attentive Multi-view Learning and Long-and Short-Term User Representations for News Recommendation
Minh Hoang Nguyen, Thuat Thien Nguyen, Minh Nhat Ta, Tung Le, Huy Tien NguyenAbstractNews recommendation systems play a critical role in alleviating information overload by delivering personalized content. A key challenge lies in jointly modeling multi-view representations of news articles and capturing the dynamic, dual-scale nature of user interests-encompassing both short- and long-term preferences. Prior methods often rely on single-view features or insufficiently model user behavior across time. In this work, we introduce Co-NAML-LSTUR, a hybrid news recommendation framework that integrates NAML for attentive multi-view news encoding and LSTUR for hierarchical user modeling, designed for training on limited data resources. Our approach leverages BERT-based embeddings to enhance semantic representation. We evaluate Co-NAML-LSTUR on two widely used benchmarks, MIND-small and MIND-large. Results show that our model significantly outperforms strong baselines, achieving improvements over NRMS by 1.55% in AUC and 1.15% in MRR, and over NAML by 2.45% in AUC and 1.71% in MRR. These findings highlight the effectiveness of our efficiency-focused hybrid model, which combines multi-view news modeling with dual-scale user representations for practical, resource-limited resources rather than a claim to absolute state-of-the-art (SOTA). Code is publicly available at https://github.com/MinhNguyenDS/Co-NAML-LSTUR. -
ScatterRAG: A Framework for Decentralized Graph Routing in RAG System
Nhat Ho Minh, Long Le Pham Tien, Kien Nguyen Trung, Trong Nhan PhanAbstractRetrieval-Augmented Generation (RAG) systems increasingly leverage structured knowledge graphs to enhance factual accuracy and interpretability. However, scaling such systems introduces challenges in routing queries efficiently across partitioned knowledge sources, particularly under memory and bandwidth constraints. We propose ScatterRAG, a lightweight and scalable routing framework for distributed RAG over partitioned knowledge graphs. ScatterRAG employs Bloom filters for compact indexing and fast negative lookups, combined with fuzzy matching to address lexical variation and noisy queries. This approach enables efficient query filtering and routing without centralized control or exhaustive broadcast. Experimental evaluations on the Natural Questions benchmark show that ScatterRAG achieves acceptable memory usage while maintaining scalability in distributed environments. Although its current prototype yields slower inference and lower retrieval accuracy compared to centralized baselines, ScatterRAG demonstrates a practical balance between scalability and resource efficiency, providing a promising foundation for future research on decentralized and efficient RAG architectures. -
A Competition-Based Large Neighborhood Search for Vessel Routing Optimization Deriving for Sustainable Marine Debris Cleanup
Trinh Duc Minh, Tat-Hien LeAbstractThe rapid increase in global solid waste generation has intensified marine pollution, as waste flows from land to oceans through rivers and canals. Studies estimate that millions of metric tons of plastic enter the ocean annually, dispersing widely due to ocean currents, tides, and winds. This accumulation on coastlines and the ocean surface poses significant environmental, economic, and health risks. Global initiatives have been introduced to improve waste management and reduce marine litter, but waste collection remains a logistical challenge, particularly in developing countries with limited resources. Optimization models, particularly the Capacitated Vehicle Routing Problem (CVRP), have been widely applied to enhance collection efficiency. Recent research extends these models to marine debris collection, integrating environmental data and predictive models to optimize vessel routing while minimizing costs and emissions. This study advances prior works by proposing a Competition-Based Large Neighborhood Search (CLNS) algorithm for large-scale Marine Debris Collection Problems (MDCP) and a modified version of a state-of-the-art hybrid algorithm from the literature. The effectiveness of our method is validated through comparative evaluations on multiple test cases. -
Bridging Usability and User Experience in AI-Based Fall Detection: A Systematic Literature Review on User-Centered Design Approach for Enhanced Adoption
Anwar E. Khidzir, Waidah Ismail, Mahadi Bahari, Ali Y. AldailamyAbstractThis study explores the role of User-Centered Design (UCD) in improving the usability and adoption of AI-based fall detection systems. Despite advances in artificial intelligence and wearable healthcare technology, usability challenges, including alert fatigue, complex interfaces, and poor customization, its limit system effectiveness and user adoption. This paper systematically reviews the usability challenges, UX solutions, and technology adoption models related to AI-driven fall detection using a PRISMA-based SLR methodology. Thematic analysis of selected studies identifies key usability barriers, UX enhancement strategies, and the role of TAM & UTAUT adoption models in driving system engagement. Findings suggest that multimodal UX, AI explainability, and adaptive alerts significantly improve adoption rates. The study provides design recommendations and future research directions for enhancing AI-driven fall detection through UCD methodologies. -
Autoencoder-Based Deep Features for Internal Defect Detection in Mangoes Using NIR Spectral Data
D. Nandini, D. S. GuruAbstractNear-Infrared (NIR) spectroscopy offers a powerful non-destructive approach for detecting internal defects in mangoes, such as spongy tissue. In this study, both lower wavelength (LW) and higher wavelength (HW) NIR spectral regions were analyzed. A customized preprocessing pipeline is implemented using techniques such as Savitzky–Golay smoothing, Multiplicative Scatter Correction (MSC), Extended MSC (EMSC), Standard Normal Variate (SNV), and detrending to enhance spectral quality. For dimensionality reduction, a tailored Autoencoder with Multi-Head Attention (AE-MHA) is employed. Extensive experiments were conducted using both AE-MHA and boosting-based classifiers (LightGBM and XGBoost), with all models evaluated using 5-fold cross-validation. The AE-MHA model, particularly on HW data, achieved the highest performance 93.75% overall accuracy and 100% accuracy in identifying defective mangoes. The HW spectral region significantly outperformed LW and also surpassed previous methods reported by Guru and Nandini (2025). This confirms the effectiveness of the proposed non-invasive framework for real-time detection of internal fruit defects, showing strong potential for integration into smart agriculture systems. -
Robust 3D Virtual Try-On Under Complex Poses
Nguyen Dinh Hieu, Pham Thi Van Anh, Do Ngoc Bich, Tran Tien Long, Pham Hong PhongAbstractThis paper presents Pose-Robust 3D Virtual Try-On Network (PR-VTON), a novel framework addressing the persistent challenges of achieving realistic 3D virtual try-on under complex human poses. In contrast to prior works that primarily handle frontal or simplified postures, PR-VTON is designed to accommodate extreme variations such as side-facing stances, crossed arms, and severe self-occlusions, conditions that often lead to geometric distortions and inconsistent garment rendering. The proposed approach integrates a personalized diffusion model with a pose-aware 3D Gaussian Splatting editing pipeline, enabling fine-grained garment transfer while preserving high-fidelity geometry and texture across multiple viewpoints. To support training and evaluation, a curated and pre-processed dataset named PR-VTON3D is introduced, containing diverse clothing types and challenging poses that offer realistic scenarios for robust model development. Through a reference-driven multi-view editing strategy and a multi-level attention fusion mechanism, PR-VTON achieves superior cross-view consistency, garment similarity, and visual realism compared to state-of-the-art baselines. Experimental results and user studies demonstrate that the proposed framework significantly enhances the reliability of 3D virtual try-on systems in real-world conditions, establishing a new benchmark for pose-invariant garment transfer. Code is available at: https://github.com/nguyendinhhieu1309/PR-VTON. -
Community Detection in Complex Overlapping Networks Using Graph Autoencoders with Semi-supervised Fuzzy Clustering
Nguyen Hai Yen, Vo Duc Quang, Tran Dinh Khang, Phan Anh PhongAbstractDetecting overlapping communities is a fundamental challenge in network analysis. Communities represent groups of nodes that are more densely connected internally than to the rest of the network, often corresponding to functional modules or social groups in real-world systems. This paper proposes a hybrid framework called GAE.SC that combines a Graph Autoencoder (GAE) with a semi-supervised fuzzy clustering algorithm to address this task. Our model first employs a GAE, trained directly on the input network, to learn low-dimensional, structure-preserving node embeddings from both graph topology and node attributes. These embeddings subsequently serve as input for a semi-supervised clustering algorithm, which incorporates supervision constraints to enhance the accuracy and interpretability of the community detection process. Experimental results on five benchmark datasets demonstrate that our method consistently outperforms GCNFCM, a validated and robust baseline for overlapping community detection in complex networks. -
Graph-Attention Policy Gradient Framework for Adaptive Traffic Signal Control in Complex Urban Networks
Nguyen Van Hieu, Nguyen Phuong Linh, Do Ngoc Bich, Nguyen Dinh HieuAbstractTraffic congestion remains a critical challenge for modern cities, causing economic loss, environmental damage, and reduced quality of life. Traditional signal control methods such as fixed-time and actuated systems cannot adapt to highly dynamic conditions, particularly in Hanoi, Vietnam, where irregular road layouts, non-lane-based driving, and motorcycle dominance create extreme variability. We propose GATLIGHT (Graph-Attention Traffic Light Control), a deep reinforcement learning framework that combines policy gradient optimization with a graph attention network to achieve decentralized yet coordinated signal control. Each intersection is modeled as an intelligent agent that exchanges attention-weighted information with its neighbors, enabling adaptive and network-aware decisions. The state representation encodes normalized vehicle counts and signal timing, while the reward minimizes the standard deviation of traffic distribution to balance flows and reduce bottlenecks. GATLIGHT is trained and evaluated in SUMO using both synthetic networks and real-world datasets from New York, Hangzhou, Jinan, and Hanoi. Compared to state-of-the-art methods such as PressLight and CoLight, GATLIGHT achieves up to 26.7% lower travel time and consistently reduces waiting time and queue length under real-world Hanoi traffic. These results demonstrate that graph-based reinforcement learning provides a scalable and robust solution for adaptive traffic management in some of the most complex and volatile urban environments. Code is available at: https://github.com/nvanhieu25/tsinghuaRL. -
CLIP-AMR-GPT: Enhancing Image Captioning via Cross-Modal Semantics Fusion and GPT-Based Re-ranking
Nguyen Van Thinh, Tran Van Lang, Nguyen Minh HaiAbstractCurrent image captioning models still face several critical challenges: insufficient exploitation of semantic knowledge, the absence of effective mechanisms to integrate heterogeneous feature sources, and limitations in generating natural and coherent output captions. To address these issues, this paper proposes a novel image captioning model, CLIP-AMR-GPT, built upon an encoder–decoder architecture that integrates multi-source knowledge. Specifically, the encoder combines vision–language features extracted from the CLIP model, relational graph embeddings representing semantic relationships among objects, and Abstract Meaning Representation (AMR) embeddings derived from ground-truth captions. The decoder employs an adaptive attention mechanism to dynamically regulate the influence of AMR-like graph embeddings at each word generation step, thereby enabling flexible exploitation of semantic structural information. Furthermore, a GPT-2-based re-ranking module is incorporated to evaluate and select captions with the highest linguistic likelihood, enhancing fluency and coherence. Experimental evaluations on the MS COCO benchmark dataset demonstrate that the proposed model outperforms state-of-the-art methods, validating the effectiveness of integrating visual, semantic, and linguistic knowledge into image captioning models.
- Title
- Multi-disciplinary Trends in Artificial Intelligence
- Editors
-
Thanh Tho Quan
Chattrakul Sombattheera
Hoang-Anh Pham
Ngoc Thinh Tran
- Copyright Year
- 2026
- Publisher
- Springer Nature Singapore
- Electronic ISBN
- 978-981-9549-60-3
- Print ISBN
- 978-981-9549-59-7
- DOI
- https://doi.org/10.1007/978-981-95-4960-3
PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.