Multi-disciplinary Trends in Artificial Intelligence
18th International Conference, MIWAI 2025, Ho Chi Minh City, Vietnam, December 3–5, 2025, Proceedings, Part I
- 2026
- Book
- Editors
- Thanh Tho Quan
- Chattrakul Sombattheera
- Hoang-Anh Pham
- Ngoc Thinh Tran
- Book Series
- Lecture Notes in Computer Science
- Publisher
- Springer Nature Singapore
About this book
This 3-volume set constitutes the proceedings of 18th International Conference on Multi-disciplinary Trends in Artificial Intelligence, MIWAI 2025, held in Ho Chi Minh City, Vietnam, during December 3–5, 2025.
The 110 full papers presented in these proceedings were carefully reviewed and selected from 306 submissionsThe papers focus on various topics in AI and its applications, such as deep learning, machine learning, computer vision, pattern recognition, and natural language processing.
Table of Contents
-
Analogical Proportions and Probabilities: Are They Compatible?
Henri Prade, Gilles RichardAbstractAnalogical proportions link four items a, b, c and d by a relation stating that “a is to b as c is to d,” a, b, c, d being the formal representation of 4 real world entities such as profiles. As such, a, b, c, d could be atomic values (like Boolean, real or nominal values), or more generally vectors thereof. In this context, we may attach to a vector representing the profile of a group of individuals in a given population, the probability (frequency) of this group in the population. This raises the question of whether four profiles, whose descriptions form an analogical proportion, also have their attached probabilities forming an analogical proportion when viewed as real numbers. This issue is explored and illustrated in this paper, which also provides some new results on numerical analogical proportions. -
A Deep-Learning Framework for Land-Sliding Classification from Remote Sensing Image
Quang-Hieu Tang, Nhat-Truong Vo Dinh, Dong-Dong Pham, Quoc-Toan Nguyen, Lam Pham, Truong NguyenAbstractThe use of satellite imagery combined with deep learning to support automatic landslide detection is becoming increasingly widespread. However, selecting an appropriate deep learning architecture to optimize performance while avoiding overfitting remains a critical challenge. To address these issues, we propose a deep learning based framework for landslide detection from remote sensing image. The proposed framework presents an effective combination of online and offline data augmentation to tackle the imbalanced data, a backbone EfficientNetV2-Large deep learning model for extracting robust embedding features, and a post-processing SVM classifier to balance and enhance the classification performance. The proposed model achieved an F1-score of 0.8938 on the public test set of the Zindi challenge. -
DPDiff: Blind Image Inpainting with Dual-Prior Diffusion
Hoai Trung Nguyen, Trong Nhan Ho, Duc Dung NguyenAbstractBlind image inpainting aims to restore images degraded by unknown corruption, where the locations and shapes of missing regions are unspecified at inference time. Existing methods typically separate mask estimation and image restoration into sequential stages, which can lead to error propagation and poor integration of structural priors, especially when dealing with complex or diverse artifacts. In this paper, we present Dual-Prior Diffusion (DPDiff), a novel framework that addresses blind inpainting by jointly predicting corruption masks and reconstructing edge maps. This simultaneous prediction of dual priors complementarily rectifies one another, significantly reducing error accumulation and enabling robust, structure-aware restoration. Leveraging these learned priors, DPDiff guides a diffusion model to generate restored images that both preserve geometric fidelity and blend seamlessly with uncorrupted content. Extensive experiments demonstrate that DPDiff sets a new state-of-the-art on multiple blind inpainting benchmarks using only four diffusion timesteps, and generalizes strongly to related image restoration tasks such as image deraining and watermark removal. -
AFC-Net: Attention-Guided Multi-backbone Deep Fusion for Grape Leaf Disease Classification
Hoang-Tu Vo, Nhon Nguyen Thien, Kheo Chau Mui, Huan Lam Le, Phuc Pham Tien, Hieu Nguyen Trung, Vuong Nguyen PhucAbstractGrape leaf diseases pose a significant threat to vineyard productivity and quality, underscoring the importance of early and accurate diagnosis. This paper introduces AFC-Net, an Attention-Guided Fusion Convolutional Network designed to enhance classification performance through the integration of multiple backbone architectures. Our model combines ResNet50, EfficientNetB0, and MobileNetV2 to extract various and complementary features from input images. These features are fused using an attention-based fusion mechanism that dynamically learns the contribution weights (\(\alpha _1\), \(\alpha _2\), \(\alpha _3\)) for each backbone, allowing the model to emphasize the most informative representations. We evaluated our approach in the Grape disease dataset, which includes four classes: Black Rot, ESCA, Healthy, and Leaf Blight. AFC-Net achieves a classification accuracy of 99.83%, with a macro-averaged F1-score of 99.84%. Notably, the model attains perfect F1-scores for both Healthy and Leaf Blight classes, demonstrating exceptional robustness across categories. To enhance model transparency, we employ Grad-CAM to visualize class-discriminative regions, providing valuable insights into the model’s decision-making process. These results highlight AFC-Net as a promising solution for real-world grape disease detection, contributing to the advancement of precision agriculture. -
Towards Compact and Efficient Vietnamese Domain-Specific LLMs via Knowledge Distillation
Huy Hoang Le Nguyen, Hiep Xuan HuynhAbstractLarge language models (LLMs) with billions of parameters achieve impressive results on domain-specific tasks but are often too resource-intensive for practical deployment. Knowledge distillation (KD) is an effective technique for compressing large models. It transfers knowledge from a teacher model to a smaller student model while maintaining performance. In this work, we focus on distilling SeaLLMs, which are large language models specifically designed for Southeast Asian languages (including Vietnamese). We transfer knowledge from a 7-billion-parameter teacher model to a smaller 1.5-billion-parameter student model. Our experiments on a Vietnamese domain-specific knowledge base show that KD from the fine-tuned QLoRA 7B teacher achieves a BERTScore-F1 of 0.7217, outperforming distillation from the base 7B teacher (0.7065) and fine-tuned student models without distillation. Moreover, the distillation process reduces the model size by approximately 78%, enabling more efficient deployment of domain-specialized LLMs. These findings demonstrate that teacher model quality critically impacts KD effectiveness and provide practical guidance for building compact, high-performance LLMs for specialized Vietnamese knowledge domains. -
Early Prediction Under Class Imbalance for a Programming Course: Feature Selection and Data Augmentation
Huy Tran, Quoc-Huy Le, Tien Vu-Van, Thi-Thiet Pham, Nguyen Huynh-Tuong, Khoa D. VoAbstractPredicting student performance in programming courses is critical for timely support and improved learning outcomes. We leverage LMS data from a flipped-classroom Programming Fundamentals course to predict in-lab performance from weekly pre-class and in-class assignments in a rolling, week-by-week setup with a five-level outcome. Building on published results that identify Random Forest (RF) as a strong baseline, we fix the predicting process and add two components: (i) Recursive Feature Elimination with Cross-Validation (RFECV) for compact feature selection; and (ii) training-only class-imbalance handling with SMOTE and a GAN-based augmenter. On a cohort of 786 students across four in-lab assessments, RFECV yields small, consistent gains (mainly at the extremes), while augmentation is the main driver of class-wise balance: GAN brings the most reliable macro-F1 improvements. Ablation further shows that training the generator in the full feature space outperforms pairing GAN with early pruning. Overall, the approach improves recognition of mid-performing students without degrading performance at the extremes, enabling earlier, targeted actions. -
ParkiDxAI: An Explainable AI System for Parkinson’s Disease Diagnosis
Huynh-Dai-Nhan Tran, Minna Isomursu, Manh-Hung Trinh, Tan-Nguyen Ngo, Gia-Hau Le, Hoang-Anh PhamAbstractDeveloping trustworthy AI systems in digital health remains a challenge, particularly in terms of explainability and reliability in clinical use. We present ParkiDxAI, a web-based clinical decision support system (CDSS) for Parkinson’s disease that integrates prediction, explanation, data storage, and communication of results. Using a real-world tabular dataset of 2,105 individuals with 32 variables, we benchmarked 10 machine-learning models on an independent test set. Within this intra-study comparison, CatBoost achieved the highest accuracy at 93.59%. ParkiDxAI provides dual-level explanations—global (SHAP) and local (LIME)—and quantifies explanation quality using faithfulness, fidelity, and sparsity. The system has been developed with a FastAPI backend, a MySQL database, and a React interface for scalable deployment. A small usability assessment suggested that the explanations and UI were clear and usable. Overall, ParkiDxAI aims to bridge the gap between model performance and clinical usability and can be adapted to other tabular clinical tasks. -
Grad-CAM-Driven Explainable Deep Learning Framework for Cervical Cancer Image Classification
Huynh-Dai-Nhan Tran, Tan-Phuoc Pham, Manh-Hung Trinh, Tan-Nguyen Ngo, Tan-Hung Nguyen, Hoang-Anh PhamAbstractMany existing Deep Learning (DL)-based approaches for cervical cancer diagnosis lack comprehensive architectural comparisons and fail to effectively integrate Explainable AI (XAI) methods, such as Grad-CAM. This study proposes a cervical cancer classification framework that utilizes several state-of-the-art DL models and employs Grad-CAM to generate heatmaps highlighting the specific image regions that influenced the model’s predictions. These visual explanations improve transparency and support clinical interpretation, thereby enhancing trust in AI-assisted diagnostic systems. The experimental results demonstrate that the proposed framework not only achieves high classification accuracy but also provides valuable visual insights, contributing to the development of more interpretable and reliable AI tools for medical image diagnosis. -
Hybrid Genetic Algorithm with Caputo Fractional Derivative for Ambulance Routing Problems
Jackel Vui Lung ChewAbstractThe ambulance routing problem plays a critical role in emergency medical services, where timely ambulance response can significantly impact patient outcomes. However, urban traffic congestion and dynamic routing conditions cause challenges to existing optimization approaches. This study proposes CaputoGA, a novel hybrid metaheuristic that integrates the Caputo fractional derivative into the genetic algorithm framework. By introducing a memory-aware penalty term and adaptive mutation rate based on this fractional derivative, CaputoGA enhances convergence behavior and routing stability. The algorithm is further enhanced with a hybrid local search mechanism combining 2-opt and swap heuristics to refine best routes. Experimental evaluations on benchmark datasets adapted from the vehicle routing problem demonstrate that CaputoGA outperforms standard GA and two state-of-the-art hybrid algorithms, such as K-means simulated annealing tabu search and priority-based adaptive particle swarm optimization, in both route cost and robustness. Experimental results show that CaputoGA substantially reduces the percentage difference relative to the known cost and can achieve the lowest percentage difference at 3.98%. CaputoGA also requires shorter execution time in completing best routes calculation compared to GA. These results validate the efficacy of implementing memory-driven dynamics into evolutionary search. This work highlights the potential of fractional calculus in optimizing complex routing problems and provides a foundation for future extensions into other metaheuristic approaches. -
Formal Analysis of Ethical Autonomous Systems Under Uncertainty
Joanna Godawa, Krishnendu GhoshAbstractAutonomous systems have become an important fixture in modern day life. Certification of ethical decision-making in autonomous system is necessary to minimize harm. In this work, a formalism of ethical decision making under uncertainty is described. The ethical principles are incorporated using Deontic logic rules of obligations, forbidden actions, and permissible actions. The uncertainty in the environment is modeled using interval discrete-time Markov chain. A tractable probabilistic model checking on interval discrete time Markov chain model is constructed for evaluation ethical decision making in a system. The results from experiments are performed using probabilistic model checking on a tractable model of interval discrete time Markov chain and statistical inference is conducted on a prototype of a moving aircraft is presented. -
Opposite Color Multiscale Local Binary Pattern Features for the Prediction of Bread Edibility
Kavitha Rajamani, Guru Devanur SAbstractBread is one of the profoundly consumed staple bakery foods by many people in the world. Quality is a remarkable concern as it is a consumable food product. It depends on the raw ingredients and baking process involved during the preparation. After the purchase of the bread, the quality and in turn the shelf life period of the bread may likely to get affected by the storage method. Hence, the edibility of the bread needs to be estimated. Most of the studies do estimate this using sensory attribute measurements like strange odor, crust color, taste, aroma, hard texture and mold formation. On contrary, this study newly attempts to examine the edibility effortlessly through images. A new variant of texture based Local Binary Pattern features is proposed for the prediction of edibility through analysis of hard texture and mold formation. As there is no benchmark bread sample dataset available for the study, a new dataset of 18,513 images is created. It is observed from the experimentation that the proposed Opposite Color Multiscale Local Binary Pattern features provide good estimation on majority voting with reduced number of features through feature transformation and selection. The accuracy obtained is 0.8493 which is comparable with other common variants of local binary pattern features. Multiple classifiers are evaluated during experimental analysis and ensemble approach outperforms well. As this is a contemporary problem addressed in the domain based on images of bread being first of its kind, it is likely to open up new challenges to be undertaken. -
Saliency Retargeting Considering Aesthetics Quality Based on Deep Learning
Kazuki Koike, Ryuichi Egoshi, Hironori TakimotoAbstractThis paper proposed two deep learning approaches for saliency retargeting while preserving aesthetic quality in images, addressing the limitations of conventional methods that focus solely on maximizing object saliency. The first approach, “OperatorNet”, estimates the intensity of multiple image editing operators to enhance the main object, utilizing EfficientNet-lite3 and multitask learning. The second approach, “GeneratorNet”, directly transforms images at the pixel level using the U-Net architecture. Both quantitative evaluations and subjective human assessments were conducted on images generated by the proposed methods. Experiments using the MS-COCO dataset confirmed that the proposed approaches achieved superior saliency retargeting while considering aesthetic quality compared to existing methods. -
MBAAF: Multi-branch Lightweight Architecture for Audio Spoofing Detection with Temporal Gating and CBAM-Based Attention Fusion
Khanh-Duy Cao-Phan, Thi Phuc DangAbstractWith the rapid advancement of AI-driven speech synthesis and voice conversion technologies, deepfake audio has emerged as a serious threat to communication integrity and cybersecurity. In this paper, we propose a lightweight hybrid attention architecture named MBAAF for audio spoofing detection, which integrates feature-specific attention modules across a multi-branch input pipeline. Specifically, Mel-Frequency Cepstral Coefficients (MFCC) and Short-Time Fourier Transform (STFT) branches are refined using Temporal Gating Blocks (TGB), while Constant-Q Cepstral Coefficients (CQCC) are enhanced with a Convolutional Block Attention Module (CBAM). These refined features are fused and processed by a compact ResNeSt backbone for final classification. Experiments show that MBAAF attains 0.15% EER and 0.011 t-DCF on ASVSpoof2019 LA and sets a new state-of-the-art of 0.092% EER on the In-the-Wild dataset, using only 135K parameters. These results highlight the effectiveness of heterogeneous attention placement and confirm that a compact, modular design can achieve both high accuracy and efficiency for real-time, low-resource deployment. -
ConvSelect-RAG: Bridging Query Enhancement and Document Filtering for Multi-turn Conversational AI
Khoa Tran Dang, Quan Thi Khac, Dang Le Binh, Duy Tran Ngoc BaoAbstractMulti-turn conversational AI systems using Retrieval Augmented Generation (RAG) often struggle with ambiguous user queries and inefficient retrieval from large document collections. Traditional chunk-level vector search introduces significant computational overhead, while missing critical context due to reliance on raw, under-specified queries. We propose ConvSelect-RAG, a novel three-stage framework that (1) enhances queries using conversation history, (2) pre-filters documents using metadata summaries before chunk-level retrieval, and (3) integrates both for context-aware response generation. Experiments on large conversational QA benchmarks demonstrate that ConvSelect-RAG reduces retrieval latency by 23.5%, improves response accuracy by 18.7%, and decreases overall computational overhead by 31.2% compared to existing RAG baselines. Our approach offers superior scalability for real-world applications by minimizing unnecessary searches and preserving contextual relevance, setting a new standard for efficient, accurate multi-turn conversational AI. -
A Comprehensive Solution to Early Fault Detection in Continuous Integration Servers
Khuong Nguyen, Chau VoAbstractContinuous Integration (CI) servers are widely-used and essential for automated software testing and deployment nowadays. To serve customer’s services smoothly and reliably, CI servers are expected to be fault-tolerant, leading to a critical need of their reliability and thus availability. On the other hand, they are complicated with many metrics that need to be monitored in real time from many different perspectives. Addressing this issue, several existing works have taken into account early fault detection in CI servers. However, it remains unsolved due to their proposed incomplete solutions and the challenges from the complex, dynamic nature of server metrics, prompting the development of our work. In this paper, we propose a comprehensive solution to early fault detection in CI servers. Our solution is based on machine learning and statistical methods for metric value prediction and then fault detection. For the first part, we define a new hybrid model by integrating AutoRegressive Integrated Moving Average (ARIMA)’s linear modeling with Random Forest’s capability to capture non-linear interactions. The novelty of our model is reflected by the stacking mechanism which effectively enhances its model components and further makes the model yield better prediction results than the others. For the latter, a combined statistical method is defined to accurately identify future faults associated with the server metrics. As a result, our solution is more effective while conserving computational resources for CI servers as evaluated on the real-world datasets including the metric values from CI server nodes across multiple global sites. -
Balancing Accuracy and Latency in Privacy-Preserving User Behavior Classification with Tree-Based Models
Kiet Nguyen Tuan, Vo Minh Tri, Nguyen Duc ThaiAbstractFully Homomorphic Encryption (FHE) enables machine learning models to operate directly on encrypted data, ensuring privacy-preserving analytics for sensitive user behavior information. However, the computational overhead of FHE raises concerns about efficiency in real-time applications. In this study, we evaluate three representative tree-based classifiers on encrypted user behavior data. The experiments are conducted under different computation depths and quantization bit-widths to examine their influence on accuracy and inference latency. Our results show that while all three models can be executed within the FHE framework, XGBoost consistently outperforms the others, achieving superior predictive accuracy with inference times suitable for near real-time analysis. These findings indicate that XGBoost offers the most effective balance between accuracy and efficiency for privacy-preserving user behavior classification. -
Physically-Grounded Goal Imagination: Physics-Informed Variational Autoencoder for Self-supervised Reinforcement Learning
Lan Thi Ha Nguyen, Kien Ton Manh, Anh Do Duc, Nam Pham HaiAbstractSelf-supervised goal-conditioned reinforcement learning enables robots to autonomously acquire diverse skills without human supervision. However, a central challenge is the goal setting problem: robots must propose feasible and diverse goals that are achievable in their current environment. Existing methods like RIG (Visual Reinforcement Learning with Imagined Goals) use variational autoencoder (VAE) to generate goals in a learned latent space but have the limitation of producing physically implausible goals that hinder learning efficiency. We propose Physics-Informed RIG (PI-RIG), which integrates physical constraints directly into the VAE training process through a novel Enhanced Physics-Informed Variational Autoencoder (Enhanced \(p^3\)-VAE), enabling the generation of physically consistent and achievable goals. Our key innovation is the explicit separation of the latent space into physics variables governing object dynamics and environmental factors capturing visual appearance, while enforcing physical consistency through differential equation constraints and conservation laws. This enables the generation of physically consistent and achievable goals that respect fundamental physical principles such as object permanence, collision constraints, and dynamic feasibility. Through extensive experiments, we demonstrate that this physics-informed goal generation significantly improves the quality of proposed goals, leading to more effective exploration and better skill acquisition in visual robotic manipulation tasks including reaching, pushing, and pick-and-place scenarios. -
A Fusion Model for Precipitation Nowcasting from Radar and Satellite Images
Le Hong Trang, Bui Khanh Vinh, Phan Thanh An, Pham Tran VuAbstractPrecipitation nowcasting is a crucial task with applications across various industries and services. Recent advancements in deep learning have enabled the use of radar imagery for precipitation nowcasting. This work proposes a fusion model that leverages both radar and satellite imagery to enhance precipitation nowcasting accuracy. Specifically, we employ a latent diffusion model called LDCast for local radar images and a large pretrained model called ClimaX for global satellite data. These models serve as baseline and foundation models, extracting temporal features. These features are then integrated within a U-Net architecture for final forecasting. We construct a local radar dataset, the Nha-Be Radar Dataset (NRD-1), and conduct experiments using NRD-1 as local data and ERA5 as global data. Despite being trained on a relatively small dataset, our model outperforms the baseline LDCast model. We further compare our model to a state-of-the-art precipitation nowcasting method to demonstrate its effectiveness.
- Title
- Multi-disciplinary Trends in Artificial Intelligence
- Editors
-
Thanh Tho Quan
Chattrakul Sombattheera
Hoang-Anh Pham
Ngoc Thinh Tran
- Copyright Year
- 2026
- Publisher
- Springer Nature Singapore
- Electronic ISBN
- 978-981-9549-57-3
- Print ISBN
- 978-981-9549-56-6
- DOI
- https://doi.org/10.1007/978-981-95-4957-3
PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.