Multi-disciplinary Trends in Artificial Intelligence
18th International Conference, MIWAI 2025, Ho Chi Minh City, Vietnam, December 3–5, 2025, Proceedings, Part II
- 2026
- Book
- Editors
- Thanh Tho Quan
- Chattrakul Sombattheera
- Hoang-Anh Pham
- Ngoc Thinh Tran
- Book Series
- Lecture Notes in Computer Science
- Publisher
- Springer Nature Singapore
About this book
This 3-volume set constitutes the proceedings of 18th International Conference on Multi-disciplinary Trends in Artificial Intelligence, MIWAI 2025, held in Ho Chi Minh City, Vietnam, during December 3–5, 2025.
The 110 full papers presented in these proceedings were carefully reviewed and selected from 306 submissions. The papers focus on various topics in AI and its applications, such as deep learning, machine learning, computer vision, pattern recognition, and natural language processing.
Table of Contents
-
IUFlowGen: An AI System for Converting Procedural Texts into Flowcharts
Nhat-Khiem Nguyen, Thanh-Tung TranAbstractProcedural documents are common in domains such as technical operations and legal compliance, yet their unstructured and complex logic often hinders comprehension. Converting such texts into flow-charts improves clarity and reduces cognitive load, but existing AI systems typically fall short: they produce static outputs without clarification, fail to adapt to varying document complexity, and lack a principled way to determine when automation is genuinely helpful. To address these challenges, we present IUFlowGen, an AI-assisted, human-in-the-loop system that combines retrieval-augmented generation, LLM reasoning, and graph-based modeling to generate structured and interactive flowcharts. We also introduce a ten-factor rubric to quantify procedural document complexity, enabling adaptive support based on document difficulty. Experiments on 30 documents of varying complexity show that IUFlowGen achieves high accuracy and completeness across all ten factors, demonstrating broader coverage than prior systems. By integrating interactivity, complexity assessment, and clarification, IUFlowGen provides a practical and effective solution for improving user comprehension of complex procedural content. -
Compact Yet Powerful: Group Query Attention in TinyViT Student Models for Efficient Classifications
Nishitha Anand, Rachit Verma, Bhaskarjyoti DasAbstractVision Transformers (ViTs) perform well in computer vision but need high computational power. This paper explores Grouped Query Attention (GQA) to reduce parameters in TinyViT models with competitive accuracy. We compare GQA settings with query-to-key ratios of 2:1 to 10:1 on CIFAR-10 and CIFAR-100 benchmarks. Findings show that GQA decreases attention parameters by a maximum of 42.8% with no loss of accuracy. The GQA 5:1 setup reaches 89.95% accuracy in CIFAR-10, surpassing the usual MHA’s 89.37% while shrinking attention parameters by 39.9%. These results prove GQA’s viability for deploying effective vision transformers within resource-limited environments. -
Potato Leaf Disease Classification in Uncontrolled Environments: Leveraging the Synergy of Handcrafted Features
Phi-Hung Hoang, Thi-Thu-Hong PhanAbstractAccurate classification of potato leaf diseases under uncontrolled conditions is challenging due to lighting variability, background clutter, and class imbalance. This study presents a lightweight and interpretable machine learning pipeline based on handcrafted feature engineering, specifically designed to address the challenges of uncontrolled imaging conditions in potato leaf disease classification. We systematically extract diverse feature types - including color statistics, color histograms, and BoVW-based texture descriptors (SIFT, KAZE) - and integrate them to form a comprehensive representation. To improve minority class recognition, Borderline-SMOTE is applied during training. Experimental results on a real-world potato leaf disease dataset demonstrate that the proposed approach achieves 81.03% accuracy using the LightGBM classifier, outperforming several state-of-the-art deep learning models. These results highlight the effectiveness of carefully engineered features and their integration for real-world plant disease classification. -
Telecom Revenue Prediction over Time Series with Pre-trained Language Models
Phuoc Lien, Nam Thoai, Vinh PhanAbstractTime series forecasting plays a critical role in many domains, yet existing methods often demand domain-specific expertise and large volumes of historical data. In this work, we introduce TELE-PLM, a novel framework that adapts pre-trained language models (PLMs) for time series forecasting, with a particular focus on telecommunication revenue prediction. Our approach transforms raw time series into text-based prototype representations and incorporates a Prompt-as-Prefix mechanism to align continuous temporal signals with the discrete token space of PLMs. We construct and release a clean, structured dataset derived from MobiFone’s transaction records, enabling reproducible research in telecom forecasting. Extensive experiments compare TELE-PLM using BERT, GPT-2, and T5 against strong baselines such as TimesNet and TimeMixer. Results demonstrate that TELE-PLM significantly outperforms traditional time series models, with T5 achieving the lowest forecasting error (MASE = 2.6846), highlighting the advantage of leveraging text-to-text architectures for temporal prediction. These findings underscore the potential of PLMs as generalizable and data-efficient forecasters. Future work will explore multi-step forecasting, cross-domain adaptation, and hybrid architectures that integrate PLMs with temporal inductive biases for enhanced performance. -
Domain Adaptation of Federated Learning by Data Generation and Server Feedback
Phuong-Anh Vu, Kim-Tinh Phan, Cao-Dien Nguyen, Tien-Dung Cao, Le Trieu Phong, Ngoc-Thai NguyenAbstractFederated learning (FL) enables collaborative model training across distributed clients without requiring centralized access to local data. However, it faces significant challenges under domain shift, where discrepancies between client data distributions and deployment environments might degrade model performance. In this paper, we introduce a novel feedback-driven framework for enhancing domain adaptation in federated settings. Our approach incorporates server-side domain analysis to detect distributional shifts during validation and generates lightweight feedback signals such as Gradient-weighted Class Activation Mapping (Grad-CAM) and SHapley Additive exPlanations (SHAP) to highlight domain-specific patterns. These signals are transmitted to clients as adaptation cues, enabling more targeted local training. In addition, the framework supports client-side data generation using modality-appropriate generative models, including variational autoencoders for tabular data, diffusion models for images, and language models for text. This data generation further mitigates heterogeneity and strengthens model generalization. By combining server-guided feedback with client-side adaptation and data augmentation, our method significantly improves performance under domain shift. Extensive experiments demonstrate consistent gains over standard FL baselines across diverse tasks and data modalities. -
Causal Temporal Transformer: An Integrated Framework for Temporal Causal Discovery and Multi-target Prediction
Quang Le, Sepideh Mousazadeh, Francois Chan, Claude D’Amours, Il-Min KimAbstractThis paper introduces the Causal Temporal Transformer (CTT), an end-to-end trainable transformer-based model designed for the dual tasks of causal time-series discovery and multi-target prediction. At its core, CTT employs a causal attention mechanism to distinguish and emphasize causal components while suppressing non-causal ones, thereby improving interpretability and feature relevance. Furthermore, to capture temporal dependencies with greater precision, the Lag Temporal (LT) embedding is incorporated, which facilitates the analysis of lagging characteristics intrinsic to the data. Notably, CTT can generate a causal graph that elucidates the potential causal relationships among the targeted variables and the prediction sequence, thereby enriching the interpretability of the model’s outputs. Experiments on Netsim, Kuramoto, and Lorenz-96 benchmarks demonstrate CTT’s superior causal time-series discovery capabilities, outperforming recent methods. Evaluations on ETH-UCY, nuScenes, and TrajAir datasets further show significant gains in multi-target prediction accuracy over baselines. Overall, CTT offers a robust, interpretable framework aligning inferred causal attributions with underlying drivers of multi-target time-series data. -
When Self-supervised Transformers Meet Knowledge Distillation: Efficient Chest X-Ray Classification
Quoc-Khang Tran, Nguyen-Khang PhamAbstractChest X-ray classification is limited by scarce annotations and the heavy cost of transformer models. We propose a three-stage framework: (1) self-supervised pretraining of a Vision Transformer (ViT) with DINOv2 on 880k unlabeled radiographs, (2) fine-tuning on ChestX-ray14, and (3) knowledge transfer into MobileViT using a combination of Binary Cross-Entropy (BCE) and Multi-Label Distillation (MLD) loss. The distilled MobileViT achieves a mean AUROC of 0.8404, surpassing its supervised counterpart by 1.9% while requiring only 0.31 GFLOPs. These results highlight the benefit of domain-specific self-supervised pretraining and demonstrate that BCE and MLD effectively transfers knowledge from large ViTs into compact models. The outcome is a lightweight yet accurate system that can be applied in clinical and resource-constrained healthcare environments. -
ViPPS: Building a Multimodal Dataset for Physics Problem Solving in Vietnamese
Quynh T. N. Vo, Xinh T. Le, Thao H. M. Tran, Tho T. QuanAbstractRecent advances in multimodal large models, particularly Vision-Language models (VLMs), have demonstrated strong capabilities in handling reasoning tasks that involve both textual and visual information. However, in low-resource languages such as Vietnamese, research in educational contexts remains limited due to the absence of suitable multimodal datasets. To address this gap, we introduce ViPPS (Vietnamese Physics Problem Solving), the first multimodal dataset designed for physics problem solving in Vietnamese. ViPPS consists of nearly 500 physics-related images (e.g., circuit diagrams, mechanics illustrations, experimental setups, and graphs) paired with more than 5,500 corresponding textual questions collected from Vietnamese educational Q&A platforms. Each entry reflects authentic student learning scenarios in secondary and high school physics. We detail the dataset creation pipeline, cleaning procedures, and provide comprehensive statistical analyses of linguistic and visual characteristics. To benchmark ViPPS, we fine-tune recent VLM baselines under a supervised multimodal setting, and report their performance across different physics domains. Results highlight both the potential of current VLMs and their limitations in solving domain-specific multimodal reasoning problems in Vietnamese. We expect ViPPS to serve as a valuable resource for advancing research on multimodal learning, physics education, and AI for low-resource languages. -
Sparsity Driven Multi-Agent Reinforcement Learning for Urban Traffic Signal Optimization
Raghava Morusupalli, D. Teja Santosh, Jyothirmai JoshiAbstractUrban traffic congestion remains a critical challenge, often worsened by inefficient and frequent signal changes at intersections. Traditional reinforcement learning (RL) methods optimize for cumulative reward but neglect the cost of excessive actions, leading to unstable signal control. This is attributed to the distributed nature of the computational problem associated. The vehicular arrival pattern varies from junction to junction complicating the smooth transit at each junction without encountering a red signal. These patterns are observed by Reinforcement Learning agents, and they coordinate with each other. Overall, we propose a Greedy Action-Minimized Reinforcement Learning (GAM-RL) framework that integrates a sparsity-inducing penalty inspired by Basis Pursuit into the Q-learning objective. Each traffic intersection is modelled as an independent RL agent that observes lane-level congestion and learns when to change or retain signal phases. The Bellman update is modified to include an ℓ₁-regularized term, discouraging unnecessary phase switches and promoting control stability. Experiments conducted over 300 episodes demonstrate that GAM-RL achieves higher average cumulative rewards than standard RL while reducing the number of signal changes per episode by a significant margin. The results confirm that GAM-RL balances flow efficiency and actuation cost, making it suitable for real-world traffic systems where stability and fairness are as important as throughput. -
A Fast AI-Powered Algorithm for Robotic Wheelchair Navigation in Obstacle-Rich Environments
Shimon Aviram, Eugene Levner, Dmitry TsadikovichAbstractA robotic wheelchair can be considered a primary means of transportation for people with limited mobility, allowing them to reach their destination safely and quickly. Using an effective pathfinding algorithm in combination with artificial intelligence-based devices installed on a wheelchair can significantly improve navigation. The purpose of this paper is twofold: (a) to develop an effective pathfinding algorithm, called D-PRM, that can be applied in obstacle-rich environment and (b) to create a mathematical decision-support tool for selecting AI-based devices to be installed on a wheelchair. To achieve these goals, we construct an efficient pathfinding algorithm that combines the strengths of the Probabilistic Roadmap (PRM) and A* methods and formulate a mixed integer mathematical programming model for AI-based device selection. Extensive computational experiments validate the performance of the proposed algorithm. -
MHEIClosed: An Efficient Algorithm for Mining Closed High-Efficiency Itemsets
Linh T. T. Nguyen, Thang G. Phung, Vinh Q. Pham, Loan T. T. NguyenAbstractHigh-Efficiency Itemset Mining (HEIM) is an emerging extension of utility-based mining that considers utility and investment, addressing key limitations of High Utility Itemset Mining (HUIM). However, existing HEIM methods often produce large and redundant result sets, complicating interpretation and post-analysis. To address this, we propose Closed High-Efficiency Itemset Mining (CHEIM), which focuses on extracting concise and non-redundant patterns while preserving essential utility and investment information. This paper presents MHEIClosed, a novel algorithm for efficiently mining closed high-efficiency itemsets. MHEIClosed incorporates advanced techniques from closed itemset and HEIM approaches, including pruning strategies based on tighter upper bounds (ssef and slef), transaction merging, and efficient closure checking. Extensive experiments on several real-world datasets demonstrate that MHEIClosed significantly outperforms the state-of-the-art HEPMClosed algorithm in terms of runtime and memory consumption, particularly on dense or large-scale datasets. These results validate MHEIClosed’s scalability and effectiveness in mining meaningful patterns, making it well-suited for practical data mining applications. -
A Two-Stage Fuzzy-Guided Genetic Algorithm for University Timetabling with LLM-Based Preference Parsing
Tam M. Nguyen, Tung T. Nguyen, Tho T. QuanAbstractUniversity course timetabling is a well-known NP-hard combinatorial optimisation problem that involves numerous hard constraints and softer, lecturer-specific preferences. In this study, we propose a two-stage fuzzy-guided genetic algorithm for generating high-quality university schedules, addressing both lecture and laboratory sessions while maintaining their logical dependencies. In the first stage, we construct lecture timetables by enforcing all hard constraints during encoding, while the fitness function evaluates the degree of soft constraint violations, particularly vague lecturer time preferences. These preferences, initially expressed in natural language, are interpreted using a large language model (LLM) and transformed into fuzzy satisfaction scores over time slots. In the second stage, laboratory sessions are scheduled based on the previously generated lecture timetable, ensuring alignment in timing and avoiding conflicts with lecture sessions. We evaluate our method using real data from the Faculty of Computer Science and Engineering at Ho Chi Minh City University of Technology (CSE@HCMUT). The results show that our method consistently produces conflict-free schedules while aligning closely with lecturer preferences, demonstrating the effectiveness of combining LLM-based interpretation and fuzzy evaluation within a genetic optimization framework. -
Spectral Curvature Signature: A Frequency-Domain Descriptor for Driving Behavior Classification
Do Thanh Thai, Quang Tran MinhAbstractUnderstanding and classifying vehicle driving behavior is critical for the development of intelligent transportation systems (ITS) and road safety analytics. In this paper, we propose a novel frequency-domain descriptor called the Spectral Curvature Signature (SCS), which characterizes the temporal dynamics of vehicle trajectory curvature through spectral and fractal features. The proposed framework processes raw (x, y) trajectories using a Savitzky–Golay filter, computes curvature over time, and applies a Fourier Transform to extract a set of compact, interpretable descriptors, including spectral energy, centroid frequency, Hurst exponent, and Band Energy Ratios (BER). We evaluate the effectiveness of SCS on both synthetic trajectories generated from a kinematic model and real-world data from the US Highway 101 Dataset (NGSIM). After applying random undersampling to address label imbalance, the classification model achieves an accuracy of 99.8%, precision of 99.9%, recall of 99.7%, and an F1-score of 99.8% in distinguishing between safe and unsafe driving behaviors using a Random Forest classifier. Feature analysis reveals spectral energy as the most discriminative metric, while the Hurst exponent offers limited utility due to the short duration of observed sequences. The method also demonstrates robustness to data noise and class imbalance. Overall, SCS provides a principled, interpretable, and low-dimensional alternative to traditional time-domain motion descriptors, with strong potential for integration into large-scale traffic monitoring and driver behavior analytics systems. -
Adaptive Urban Traffic Signal Control via Lyapunov Optimization, LSTM-Based Forecasting, and Shock-Aware Phase Re-Service
Do Thanh Thai, Quang Tran MinhAbstractThis paper presents a novel traffic signal control methodology that combines Lyapunov Drift-Plus-Penalty (DPP) optimization with a shock-aware phase re-service mechanism and LSTM-based traffic forecasting to achieve both queue stability and delay minimization at urban intersections. The proposed controller dynamically reallocates green time in response to evolving queue lengths, particularly prioritizing oversaturated approaches to prevent congestion build-up and mitigate traffic shockwaves. LSTM networks are employed to capture real-time temporal patterns in traffic flow, enabling more responsive and data-driven decision-making. The framework is grounded in queueing theory and incorporates Kingman’s delay approximation to model system dynamics, while theoretical analysis ensures strong stability and provable near-optimal delay under admissible traffic demand. Extensive simulations were conducted across two scenarios. In the first, the proposed method was compared with traditional fixed-time control, achieving a 17.21% reduction in average waiting time, a 6.97% decrease in time loss, and a 2.35% reduction in average trip duration, with fewer teleport events and improved simulation efficiency. In the second scenario, compared to a baseline without shock-awareness, the proposed method reduced the average queue length by approximately 93.9% (from 263.54 to 16.09 vehicles) and the total queue delay by 93.9% (from 1054.17 to 64.38 vehicle\(\cdot \)slots), with 2,064 targeted shock re-allocations contributing to improved traffic stability. These results confirm the method’s effectiveness in both proactive congestion management and reactive traffic surge mitigation. This work contributes a theoretically grounded and practically viable adaptive signal control strategy, enhanced by real-time sequence modeling, with strong potential for scalable deployment in smart urban traffic systems. -
Revolutionizing Precise Low Back Pain Diagnosis via Contrastive Learning
Thanh Binh Le, Hoang Nhat Khang Vo, Tan Ha Mai, Trong Nhan PhanAbstractLow back pain affects millions worldwide, driving the need for robust diagnostic models that can jointly analyze complex medical images and accompanying text reports. We present \(\texttt{LumbarCLIP}\), a novel multimodal framework that leverages contrastive language-image pretraining to align lumbar spine MRI scans with corresponding radiological descriptions. Built upon a curated dataset containing axial MRI views paired with expert-written reports, \(\texttt{LumbarCLIP}\) integrates vision encoders (ResNet-50, Vision Transformer, Swin Transformer) with a BERT-based text encoder to extract dense representations. These are projected into a shared embedding space via learnable projection heads - configurable as linear or non-linear—and normalized to facilitate stable contrastive training using a soft CLIP loss. Our model achieves state-of-the-art performance on downstream classification, reaching up to \(95.00\%\) accuracy and \(94.75\%\) F1-score on the test set, despite inherent class imbalance. Extensive ablation studies demonstrate that linear projection heads yield more effective cross-modal alignment than non-linear variants. \(\texttt{LumbarCLIP}\) offers a promising foundation for automated musculoskeletal diagnosis and clinical decision support. -
Toward Sustainable and Cost-Efficient HPC Systems: A Deep Reinforcement Learning Job Scheduling Approach
Thanh Hoang Le Hai, Nhan Nguyen Phuc, Mai Nguyen Tran Phuong, Minh Bui Ngoc, Nam ThoaiAbstractHybrid-powered High Performance Computing (HPC) centers can reduce both operational costs and carbon footprints by aligning job execution with renewable supply and time-varying electricity prices. However, existing Deep Reinforcement Learning (DRL) schedulers often neglect price signals, rely on synthetic energy traces, or conflate job selection and delay, limiting their real-world applicability. This work introduces a cost-aware, multi-action DRL framework that embeds a hybrid energy cost model into an HPC scheduling environment, enabling agents to incorporate dynamic price signals and renewable forecasts while decoupling job selection from execution delay. Evaluated on production-scale workload traces and measured renewable profiles, the proposed scheduler consistently reduces energy costs, improves renewable utilization, and maintains competitive job responsiveness compared with heuristic and learning-based baselines. These results demonstrate a practical path toward economically and environmentally sustainable HPC operations. -
Planning to Prove: Improving Informal Proofs of Olympiad Inequalities from Large Language Models
Le Van Thanh, Do Xuan Trong, Pham Duc Tinh, Hai Van PhamAbstractRecent advancements in Large Language Models (LLMs) have enabled them to perform exceptionally well on diverse natural language processing and reasoning challenges, with emergent abilities that increasingly mimic human cognition. While LLMs show promise in general mathematical problem-solving, their ability to generate informal, step-by-step proofs for Olympiad-level theorems that are both logically sound and human-verifiable remains underexplored. Existing research has primarily focused on formal theorem proving, which depends on strict symbolic systems and external verifiers, often neglecting the nuanced demands of informal mathematical reasoning. In this paper, we introduce Planning to Prove, a prompt-based framework designed to guide LLMs in producing coherent, step-by-step informal proofs for Olympiad-level inequalities. This approach leverages the inherent capacity of LLMs for multi-step reasoning and planning, enabling the model to first construct a structured outline of the proof before generating the detailed argument. The proposed method is evaluated on a curated set of inequality problems from Mathematical Olympiads and demonstrates that it improves both the correctness and interpretability of the generated proofs. The results highlight the critical role of structured reasoning in informal mathematical domains and suggest promising directions for enhancing LLM performance in advanced mathematical reasoning tasks. -
REBot: From RAG to CatRAG with Semantic Enrichment and Graph Routing
Thanh Ma, Tri-Tam La, Lam-Thu Le Huu, Minh-Nghi Nguyen, Khanh-Van Pham LuuAbstractAcademic regulation advising is vital for helping students interpret and comply with institutional policies, yet building effective systems requires domain-specific regulatory resources. To address this challenge, we propose REBot, an LLM-enhanced advisory chatbot powered by CatRAG, a hybrid retrieval–reasoning framework that integrates RAG with GraphRAG. We introduce CatRAG that unifies dense retrieval and graph-based reasoning, supported by a hierarchical, category-labeled knowledge graph enriched with semantic features for domain alignment. A lightweight intent classifier routes queries to the appropriate retrieval modules, ensuring both factual accuracy and contextual depth. We construct a regulation-specific dataset and assess REBot on classification and question-answering tasks, achieving state-of-the-art performance with an F1-score of 98.89%. Finally, we implement a web application that demonstrates the practical value of REBot in real-world academic advising scenarios.
- Title
- Multi-disciplinary Trends in Artificial Intelligence
- Editors
-
Thanh Tho Quan
Chattrakul Sombattheera
Hoang-Anh Pham
Ngoc Thinh Tran
- Copyright Year
- 2026
- Publisher
- Springer Nature Singapore
- Electronic ISBN
- 978-981-9549-60-3
- Print ISBN
- 978-981-9549-59-7
- DOI
- https://doi.org/10.1007/978-981-95-4960-3
PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.