Zum Inhalt

Multi-disciplinary Trends in Artificial Intelligence

18th International Conference, MIWAI 2025, Ho Chi Minh City, Vietnam, December 3–5, 2025, Proceedings, Part I

  • 2026
  • Buch
insite
SUCHEN

Über dieses Buch

Diese dreibändige Sammlung stellt die Vortragsreihe der 18. Internationalen Konferenz über multidisziplinäre Trends in der künstlichen Intelligenz MIWAI 2025 dar, die vom 3. bis 5. Dezember 2025 in Ho-Chi-Minh-Stadt, Vietnam, stattfand. Die 110 vollständigen Beiträge, die in diesen Vorträgen vorgestellt wurden, wurden sorgfältig geprüft und aus 306 Einreichungen ausgewählt. Die Beiträge konzentrieren sich auf verschiedene Themen der KI und ihrer Anwendungen wie Deep Learning, maschinelles Lernen, Computervision, Mustererkennung und Verarbeitung natürlicher Sprache.

Inhaltsverzeichnis

Frontmatter
Topological Activation Maps for Visual Representation Learning from Tabular Data

The transformation of non-image data into visual representations suitable for deep learning remains a challenging frontier in machine learning. Many existing methods map tabular data to images via global feature-space layouts, which can reduce instance-specific nuance. We propose Topological Activation Maps (TAMs), a new framework that couples global feature topology with per-instance activations for more faithful and interpretable data transformations. TAMs use a two-phase embedding: (i) kernel projection and Self-Organizing Map training establish a prototype grid capturing global feature relationships; (ii) each sample generates a unique activation map by interacting with this grid through Gaussian-weighted distances. This design preserves both dataset-level structure and sample-specific characteristics. On benchmark classification datasets, TAMs deliver competitive or superior accuracy compared with strong ensemble baselines while offering a complementary perspective on interpretability.

M. Achutha, Bhaskarjyoti Das
Robust Leaf Disease Classification via Deep Feature Concatenation and EfficientNetV

Plants play a vital role in global food security, ecological balance, and economic sustainability. However, their susceptibility to a wide range of leaf diseases poses a major threat to crop yield and quality. Early and accurate identification of these diseases is essential for effective management. Manual inspection methods, though widely used, are inefficient, subjective, and unsuitable for large-scale agricultural operations. To address this challenge, we propose a deep learning-based classification system that significantly improves disease recognition performance. The model is designed to classify leaf images from six plant species—including both healthy and diseased samples—across 14 distinct categories. Our architecture leverages EfficientNetV2B0 as the feature extractor, enhanced with custom dense layers and feature concatenation to capture subtle visual cues. Through preprocessing and balanced dataset splitting, combined with training techniques such as dropout, adaptive learning rate scheduling, and early stopping, the model achieves robust generalization. Experimental evaluation shows that our approach improves classification accuracy from a baseline of 90.18% to 92.86%.

Ai My Thi Nguyen, Hoang Huy Le, Vinh Dinh Nguyen
Shallow Transformers with Applications Towards Image and Text Classification

The Transformer architecture has quickly become extremely popular as it has achieved state-of-the-art performance in a variety of tasks with a relatively simple design of repeating blocks. Variants of Transformers are now staples in many classification tasks, including language modeling, image classification, and even object detection. The core aspect of the architecture, that is the sequential repeating blocks, has however remained unchanged throughout this time. In this work, we explore an alternative horizontally growing architecture that achieves similar results on the common tasks in which Transformers are proficient, while providing more controllability for parameter expansion due to the model’s shallow nature. We compare with two standard models: BERT for natural language processing and ViT in computer vision. We show that our model achieves comparable results while maintaining very low depth, and in some cases, with just a single layer. To the best of our knowledge, this is the first study that demonstrates the possibility and efficacy of such models. We provide results on some standard benchmarks, i.e., MNLI in case of Natural Language Inference, and CIFAR-10, CIFAR-100 for image classification. We also provide all the source code for our experiments. (The code can be found at https://github.com/akshaybadola/shallow-transformers .

Akshay Badola, Vineet Padmanabhan, Rajendra Prasad Lal, Wilson Naik
Computational Analysis of Cyber Risk Management by AI Coaching of Network DDoS Attack Activity in European Industries

This project investigates cyber risk management during Distributed Denial of Service (DDoS) attacks in European industries that impact many critical sectors such as finance, telecommunications, and technology. Using adaptive network modeling and MATLAB, a model was developed to simulate the interaction between human decision-making and AI coaching under stress. The AI Coach, equipped with complete situational awareness, monitors employee behavior and intervenes when errors are detected. Through What-If analysis and risk assessment, it was discovered that factors such as detection speed and error influence financial loss outcomes. The simulation results demonstrate that AI-supported intervention can significantly reduce system vulnerability by correcting human misjudgments in real-time.

Aleksander Szymczak, Jan Treur, Peter H. M. P. Roelofsma
Learning to Defer with Scoring Functions

Machine learning models in high-stakes applications often collaborate with human experts, requiring intelligent deferral mechanisms. While traditional rejection learning uses model uncertainty to decide deferrals, it ignores human error and lacks flexibility in coverage control. Learning-to-defer approaches address this by jointly optimizing a classifier and a rejector, using loss functions that account for both model and human performance. However, existing methods either disregard human resource constraints or require costly retraining when coverage needs change. We explore a scoring-based approach to learning to defer that addresses these limitations. Building on prior heuristic methods, we introduce a novel metric quantifying how effectively a deferral system leverages both model and human expertise across the feature space. We then derive the theoretically optimal deferral rule and develop a practical scoring function approximation that enables post-hoc coverage adjustment without retraining. Our method trains a scoring function to rank samples by their expected delegation benefit, which can be calibrated to meet dynamic coverage constraints. Experiments show this approach achieves superior accuracy-workload trade-offs compared to existing methods, providing both theoretical grounding and practical flexibility for human-AI collaboration.

Andrew Ponomarev
Prediction of Pregnancy-Related Adverse Drug Reactions from Chemical Conformers Using a Fractional-Pooling Dilated CNN

Adverse drug reactions (ADRs) during pregnancy represent a critical concern, as they can adversely affect both maternal and fetal health. However, the availability of clinical evidence on drug safety in this population has remained limited, primarily due to the ethical restrictions associated with conducting controlled trials in pregnant women. A range of computational approaches has been proposed to address this gap. Nonetheless, the majority of these methods have relied on one-dimensional or two-dimensional molecular descriptors, thereby neglecting the richer structural information contained within chemical conformers. In this work, we propose FracPool-DCNN, a novel deep learning architecture that integrates dilated convolutions with fractional max pooling to predict pregnancy-related ADRs directly from conformer images. Using a curated dataset of drugs from PubChem conformers and ADReCS-based annotations, the model has been trained and evaluated with five-fold cross-validation. FracPool-DCNN has achieved superior performance compared to nine baseline models, with a harmonic mean of 76.89%, AUPR of 77.42%, and ROC-AUC of 79.89%, while ablation studies confirm the critical contributions of fractional pooling and global average pooling. These findings highlight the promise of conformer-based deep learning for robust pregnancy drug-safety classification, offering a scalable approach to preclinical risk assessment.

Anushka Chaurasia, Deepak Kumar, Yogita Yogita
Compositional Distributional Semantics and the Conjunction Effect in Language Models

Compositional distributional semantics in natural language processing allows us to compose meaning vectors for complex phrases from those for word vectors. Here we apply the framework of compositional distributional semantics to construct meaning vectors for bank teller and feminist bank teller in the well-known Linda problem, and compare them via cosine similarity to a meaning vector encoding the Linda description provided in the problem. In a noun vector space, the meaning vector for the conjoined phrase feminist bank teller is substantially closer to the meaning vector for the Linda description than the meaning vector for the occupation-only phrase bank teller; crucially, this effect requires the full adjective-as-matrix treatment provided by compositional distributional semantics. In this way, we can provide a computational linguistic account of the conjunction effect in the Linda problem.

Yoshihiro Maruyama, Arisa Yasuda
Quantum Patches for Efficient Learning

The problem of conserving computational resources while training models in deep learning has become urgent as models increasingly require more input data to improve accuracy. To enhance accuracy beyond increasing data, researchers have leveraged quantum properties, such as using random quantum circuits (RQC) to transform input data, similar to how data augmentation techniques are applied, and have demonstrated their effectiveness. However, the appearance of RQC introduces new challenges, as quantum circuits transforming features consume significant computational resources and increase the algorithm’s time complexity. In this study, we propose a novel framework combining RQC and saliency map techniques to address the computational resource problem in quantum deep learning. The research results show that our framework has reduced the algorithm’s computation time by 2.34 times, achieving an accuracy of 92.51%, 2.2% higher than the quantum baseline version. This reduction is highly significant in the current era, where quantum computers are in the noisy intermediate-scale quantum (NISQ) era, and noise significantly impacts the accuracy of hardware computation outputs, making quantum hardware accesses relatively costly.

Ban Q. Tran, Chuong K. Luong, Susan Mengel
Approximated Outlier Selection and Visualization with Approximated t-SNE

Working with high-dimensional data is inherently difficult due to its sparse distribution and structural complexity, which often obscure underlying patterns and reduce the effectiveness of visual analysis. Although technique like t-Distributed Stochastic Neighbor Embedding (t-SNE) is commonly used to map such data into lower dimensions while preserving local structures, their heavy computational demands—especially during initialization and when computing pairwise similarities—limit their use in applications that require fast or interactive performance. This paper presents a refined framework based on Approximated t-SNE (A-tSNE) to address these issues. The more efficient variant of A-tSNE employs approximate k-nearest neighbor (k-NN) searches. The refined A-tSNE is combined with sparsified graph construction based on estimates of local intrinsic dimensionality and a dynamic mechanism for detecting outliers. The Approximated Outlier Selection Factor (AOSF) is a key component of the method, which allows anomalous points to be identified and filtered out before generating the visual representation. Experimental validation on the MNIST and CIFAR-100 datasets reveals that this approach produces more precise and more informative visualizations by sharpening class boundaries, improving cluster separation, and preserving neighborhood consistency. These enhancements are further supported by quantitative assessments using trustworthiness and accuracy scores. The proposed method delivers a scalable, interactive, and outlier-aware visualization strategy that effectively balances computational efficiency with robust anomaly handling in high-dimensional data analysis.

Dharamsotu Bheekya, Salman Abdul Moiz, C. Raghavendra Rao
Spread-Learned Spatial Features to Improve Tick-Shape Networks

The rapid growth of mobile devices and embedded systems requires lightweight networks with high performance in reasonable complexity. Recently, TickNets have been proposed to meet that requirement via connecting several tick-shape backbones. However, their performance is still at modest levels due to the lack of spatial features exploited from a basic tick-shape backbone. To mitigate this problem, an efficient perceptron is proposed to take into account spread-learned spatial features for improving the learning ability of tick-shape networks. Accordingly, this spread-learned feature extractor is simply done by adding a Full Residual Point-Depth-Point (FR-PDP) block to the beginning of a basic backbone. Such strategy will ensure two practical benefits for the tick-shape networks: i) Exploiting the identical FR-PDP-based features in a tick-shape backbone; and ii) Extracting more discriminative spatial features for the learning process. Finally, STickNets are formed by simply connecting several spread-learned tick-shape backbones. Experimental results on various benchmark datasets have indicated that our proposal has significantly boosted the performance of tick-shape networks. In particular, STickNet-basic is enhanced by $$\sim $$ ∼ 3.5% on CIFAR-100, up to 7.4% on Stanford Dogs. The implementation code is available at https://github.com/ngochc/STicknets .

Canh Ngoc Hoang, Thanh Phuong Nguyen, Hoang Anh Pham, Thinh Vinh Le, Thi-The Phan, Thanh Tuan Nguyen
An Overview of the Effectiveness of Graph Learning Methods for Traffic Demand Forecasting

Traffic demand forecasting plays a crucial role in intelligent transportation systems and is a fundamental aspect of smart cities. The spatial-temporal nature of this task poses significant challenges for forecasting models, especially when it comes to extracting spatial features from complex graphs. To effectively capture these intricate spatial patterns, previous studies have explored a variety of methods for constructing a graph from spatial data. In this study, we present a thorough survey and taxonomy of existing research based on various graph construction methods, including static, adaptive, and dynamic approaches. To thoroughly evaluate these models and methodologies, we conduct experiments on seven real-world datasets. Among these, two are widely recognized benchmarks, while the other five have been newly collected and processed by us from government open data platforms. Our findings enable us to analyze and compare the strengths and limitations of various approaches. We also identify emerging trends and assess the current effectiveness of these methods. Finally, we propose potential research directions and opportunities for future work in this field.

Luong-Chi Trung, Chung-Thai Kiet, Nguyen-Huu An, Dung-Cam Quang
Decomposition-Based Optimization of Multi-camera Networks for Coverage Maximization Problem

The proliferation of low-cost, high-efficiency surveillance cameras has made multi-camera networks a ubiquitous tool for security and monitoring in public spaces. However, the optimal placement of these cameras to maximize coverage while minimizing resources presents a significant and computationally complex challenge, known to be NP-hard. This research introduces a novel and effective model to address the camera planning problem. We formulate the problem and propose a decomposition strategy that partitions the main problem into smaller, manageable subproblems based on a neighbor graph of potential camera locations. This decomposition allows for the efficient application of well-established solvers to find optimal or near-optimal solutions within practical time constraints. Our model utilizes a 2D map visualization of sensor views, which simplifies the calculation of coverage areas and ensures the direct applicability of our solutions to real-world scenarios. Experimental results, conducted on a set of candidate locations within a university campus, demonstrate the proposed model’s superior performance. The formulation successfully maximizes spatial coverage under limited computational budgets, validating its effectiveness and potential for deployment in practical surveillance network design. This approach offers a significant advancement in strategic camera placement, providing a scalable and efficient solution to a critical security problem.

Dang Phuoc Vinh Hung, Nguyen Cao Dat, Tran Van Hoai, Nguyen Huu Hieu
Bridging Language and Vision: Fine-Tuning Latent Diffusion Models for Robust Text-to-Image Generation

This paper presents a progressive approach to text-to-image generation by developing an AI system capable of producing vivid and semantically faithful images from natural language descriptions. We adopt a multi-stage pipeline beginning with foundational experiments using Generative Adversarial Networks (GANs) on the MNIST dataset, gradually scaling to more complex datasets such as COCO. To bridge the semantic gap between textual and visual modalities, we integrate a BERT-based text encoder with a Convolutional Neural Network (CNN)-based generator, enabling the system to capture nuanced textual features and translate them into coherent visual representations. Building on these foundations, we leverage the Latent Diffusion Model (LDM) to enhance image quality and fidelity. Our contributions include model optimization, fine-tuning of LDM components, and a detailed evaluation of robustness across various textual inputs. Notably, we identify failure cases involving typographical and contextual input variations, highlighting key limitations in current diffusion-based models. These findings inform our recommendations for future improvements in semantic alignment and input resilience. Overall, our work demonstrates the effectiveness of combining transformer-based text encoders with generative architectures and sets the stage for more robust, high-fidelity text-to-image synthesis systems.

Daniel Vadranapu, Abhiram Yadav Myla, Charan Ramtej Kodi
Principal Directions-Based Data Classification Optimized by Genetic Algorithms and K-Nearest Neighbors

In this paper, we propose an optimized version of the Principal Directions algorithm by integrating Genetic Algorithms (GAs) and the k-Nearest Neighbors (KNN) technique, with applications in image class classification. The algorithm enhances the step of quantifying the disturbance introduced to the principal directions when new images are added to existing image classes. The optimal number of principal directions required for effective classification is determined using a genetic algorithm, while an improved version of the KNN algorithm is employed to optimize the key parameter - the number of nearest neighbors - used during testing. For validation and testing, we apply the proposed method to the classification of face image classes corresponding to multiple individuals. Several variants of the algorithm are developed, differing in the similarity measure used within the KNN algorithm and the type of crossover operator employed during binary gene recombination (ex. one-point, two-point, or multi-point crossover). Experimental results demonstrate that the proposed approach achieves high accuracy in recognizing new face images, validating its effectiveness.

Doru Constantin, Costel Bălcău
FedIncSparse: A Federated Learning Framework with Top-K Sparse Delta Transmission and Incremental Updates

While federated learning (FL) enables privacy-preserving model training across decentralized devices, its performance is often hampered by non-independent and identically distributed (non-i.i.d.) data and high communication overhead. This paper introduces FedIncSparse*, a new FL framework that addresses these dual challenges through top-K percent sparse delta updates. This approach combines two key advantages: delta transmission slashes communication volume by sending only parameter changes, while incremental updates prevent client drift by gradually refining the global model. We provide robust convergence guarantees for both convex and non-convex objectives. Evaluations on standard benchmarks (MNIST, Fashion-MNIST, CIFAR-10) show FedIncSparse achieves superior accuracy, faster convergence, and up to 35% lower communication costs than FedAvg and FedProx. Furthermore, it delivers performance comparable to FedZip but with reduced computational demands. (Code is available at: https://github.com/dongld-2020/FedIncSparse .)

Duy-Dong Le, Duy-Thanh Huynh, Tuong-Nguyen Huynh
Metric-Weighted Voting Classifier with SMOTEENN for Enhanced Machine Learning Cardiovascular Disease Prediction

Cardiovascular diseases (CVDs) remain a leading cause of global mortality, underscoring the need for reliable and interpretable predictive models to support early diagnosis and clinical decision-making. This study proposes a hybrid ensemble framework that integrates SMOTEENN resampling with a metric-weighted Voting Classifier to address class imbalance and improve prediction robustness. Using the publicly available Kaggle CVD dataset ( $${\approx } 70,000$$ ≈ 70 , 000 records), the dataset was split into 80% training and 20% testing using stratified sampling and we evaluated a diverse pool of base classifiers including Decision Tree, Logistic Regression, K-Nearest Neighbour, Random Forest, LightGBM, XGBoost, and Multilayer Perceptron. Performance metrics such as Accuracy, Precision, Recall, F1-score, Matthews Correlation Coefficient (MCC), and ROC-AUC were employed to provide a comprehensive evaluation. Results demonstrate that the proposed SMOTEENN + metric-weighted ensemble significantly outperformed the baseline (non-resampled) and existing models, improving accuracy from 72.6% to 96.9%, F1-score from 0.72 to 0.97, and ROC-AUC from 0.73 to 0.97. Moreover, the metric-weighting scheme allowed aggregation to be guided by clinically meaningful criteria, enhancing sensitivity and balanced classification.

Emmanuel Ileberi, Yansia Sun
Disentangled Latent Augmentation for Abnormality Detection in Musculoskeletal Radiographs

Deep learning has shown impressive results in computer vision, but it often underperforms in imbalanced datasets. This challenge is prominent in medical imaging, particularly in the classification of musculoskeletal disorders, where minority classes are underrepresented. This study presents a generative framework for the detection of musculoskeletal abnormalities with a specific focus on the problem of data imbalance in the classification of radiographic images. A disentanglement-driven approach is employed using $$\beta $$ β -variational autoencoder ( $$\beta $$ β -VAE), which facilitates the generation of diverse and class-consistent samples through latent space manipulation. These synthetic samples are utilized within a triplet network for metric learning, enhancing discriminative representation by promoting greater inter-class separability and intra-class compactness in the latent space, thus effectively mitigating imbalance during classification. Experimental evaluation of the musculoskeletal radiograph (MURA) dataset, the proposed triplet network with $$\beta $$ β -VAE improves classification performance, achieving $$11.3\%$$ 11.3 % higher accuracy and $$42.3\%$$ 42.3 % greater Cohen’s kappa on the finger study type (DenseNet-169), and $$15.6\%$$ 15.6 % accuracy gain with $$27.1\%$$ 27.1 % Cohen’s kappa improvement on the forearm study type (ResNet-50), demonstrating its effectiveness for imbalance-aware diagnostic imaging.

Thota Gokaramaiah, Korra Sathya Babu, K Nagaraju, Nenavath Srinivas Naik
Analogical Proportions and Probabilities: Are They Compatible?

Analogical proportions link four items a, b, c and d by a relation stating that “a is to b as c is to d,” a, b, c, d being the formal representation of 4 real world entities such as profiles. As such, a, b, c, d could be atomic values (like Boolean, real or nominal values), or more generally vectors thereof. In this context, we may attach to a vector representing the profile of a group of individuals in a given population, the probability (frequency) of this group in the population. This raises the question of whether four profiles, whose descriptions form an analogical proportion, also have their attached probabilities forming an analogical proportion when viewed as real numbers. This issue is explored and illustrated in this paper, which also provides some new results on numerical analogical proportions.

Henri Prade, Gilles Richard
A Deep-Learning Framework for Land-Sliding Classification from Remote Sensing Image

The use of satellite imagery combined with deep learning to support automatic landslide detection is becoming increasingly widespread. However, selecting an appropriate deep learning architecture to optimize performance while avoiding overfitting remains a critical challenge. To address these issues, we propose a deep learning based framework for landslide detection from remote sensing image. The proposed framework presents an effective combination of online and offline data augmentation to tackle the imbalanced data, a backbone EfficientNetV2-Large deep learning model for extracting robust embedding features, and a post-processing SVM classifier to balance and enhance the classification performance. The proposed model achieved an F1-score of 0.8938 on the public test set of the Zindi challenge.

Quang-Hieu Tang, Nhat-Truong Vo Dinh, Dong-Dong Pham, Quoc-Toan Nguyen, Lam Pham, Truong Nguyen
DPDiff: Blind Image Inpainting with Dual-Prior Diffusion

Blind image inpainting aims to restore images degraded by unknown corruption, where the locations and shapes of missing regions are unspecified at inference time. Existing methods typically separate mask estimation and image restoration into sequential stages, which can lead to error propagation and poor integration of structural priors, especially when dealing with complex or diverse artifacts. In this paper, we present Dual-Prior Diffusion (DPDiff), a novel framework that addresses blind inpainting by jointly predicting corruption masks and reconstructing edge maps. This simultaneous prediction of dual priors complementarily rectifies one another, significantly reducing error accumulation and enabling robust, structure-aware restoration. Leveraging these learned priors, DPDiff guides a diffusion model to generate restored images that both preserve geometric fidelity and blend seamlessly with uncorrupted content. Extensive experiments demonstrate that DPDiff sets a new state-of-the-art on multiple blind inpainting benchmarks using only four diffusion timesteps, and generalizes strongly to related image restoration tasks such as image deraining and watermark removal.

Hoai Trung Nguyen, Trong Nhan Ho, Duc Dung Nguyen
AFC-Net: Attention-Guided Multi-backbone Deep Fusion for Grape Leaf Disease Classification

Grape leaf diseases pose a significant threat to vineyard productivity and quality, underscoring the importance of early and accurate diagnosis. This paper introduces AFC-Net, an Attention-Guided Fusion Convolutional Network designed to enhance classification performance through the integration of multiple backbone architectures. Our model combines ResNet50, EfficientNetB0, and MobileNetV2 to extract various and complementary features from input images. These features are fused using an attention-based fusion mechanism that dynamically learns the contribution weights ( $$\alpha _1$$ α 1 , $$\alpha _2$$ α 2 , $$\alpha _3$$ α 3 ) for each backbone, allowing the model to emphasize the most informative representations. We evaluated our approach in the Grape disease dataset, which includes four classes: Black Rot, ESCA, Healthy, and Leaf Blight. AFC-Net achieves a classification accuracy of 99.83%, with a macro-averaged F1-score of 99.84%. Notably, the model attains perfect F1-scores for both Healthy and Leaf Blight classes, demonstrating exceptional robustness across categories. To enhance model transparency, we employ Grad-CAM to visualize class-discriminative regions, providing valuable insights into the model’s decision-making process. These results highlight AFC-Net as a promising solution for real-world grape disease detection, contributing to the advancement of precision agriculture.

Hoang-Tu Vo, Nhon Nguyen Thien, Kheo Chau Mui, Huan Lam Le, Phuc Pham Tien, Hieu Nguyen Trung, Vuong Nguyen Phuc
Towards Compact and Efficient Vietnamese Domain-Specific LLMs via Knowledge Distillation

Large language models (LLMs) with billions of parameters achieve impressive results on domain-specific tasks but are often too resource-intensive for practical deployment. Knowledge distillation (KD) is an effective technique for compressing large models. It transfers knowledge from a teacher model to a smaller student model while maintaining performance. In this work, we focus on distilling SeaLLMs, which are large language models specifically designed for Southeast Asian languages (including Vietnamese). We transfer knowledge from a 7-billion-parameter teacher model to a smaller 1.5-billion-parameter student model. Our experiments on a Vietnamese domain-specific knowledge base show that KD from the fine-tuned QLoRA 7B teacher achieves a BERTScore-F1 of 0.7217, outperforming distillation from the base 7B teacher (0.7065) and fine-tuned student models without distillation. Moreover, the distillation process reduces the model size by approximately 78%, enabling more efficient deployment of domain-specialized LLMs. These findings demonstrate that teacher model quality critically impacts KD effectiveness and provide practical guidance for building compact, high-performance LLMs for specialized Vietnamese knowledge domains.

Huy Hoang Le Nguyen, Hiep Xuan Huynh
Early Prediction Under Class Imbalance for a Programming Course: Feature Selection and Data Augmentation

Predicting student performance in programming courses is critical for timely support and improved learning outcomes. We leverage LMS data from a flipped-classroom Programming Fundamentals course to predict in-lab performance from weekly pre-class and in-class assignments in a rolling, week-by-week setup with a five-level outcome. Building on published results that identify Random Forest (RF) as a strong baseline, we fix the predicting process and add two components: (i) Recursive Feature Elimination with Cross-Validation (RFECV) for compact feature selection; and (ii) training-only class-imbalance handling with SMOTE and a GAN-based augmenter. On a cohort of 786 students across four in-lab assessments, RFECV yields small, consistent gains (mainly at the extremes), while augmentation is the main driver of class-wise balance: GAN brings the most reliable macro-F1 improvements. Ablation further shows that training the generator in the full feature space outperforms pairing GAN with early pruning. Overall, the approach improves recognition of mid-performing students without degrading performance at the extremes, enabling earlier, targeted actions.

Huy Tran, Quoc-Huy Le, Tien Vu-Van, Thi-Thiet Pham, Nguyen Huynh-Tuong, Khoa D. Vo
ParkiDxAI: An Explainable AI System for Parkinson’s Disease Diagnosis

Developing trustworthy AI systems in digital health remains a challenge, particularly in terms of explainability and reliability in clinical use. We present ParkiDxAI, a web-based clinical decision support system (CDSS) for Parkinson’s disease that integrates prediction, explanation, data storage, and communication of results. Using a real-world tabular dataset of 2,105 individuals with 32 variables, we benchmarked 10 machine-learning models on an independent test set. Within this intra-study comparison, CatBoost achieved the highest accuracy at 93.59%. ParkiDxAI provides dual-level explanations—global (SHAP) and local (LIME)—and quantifies explanation quality using faithfulness, fidelity, and sparsity. The system has been developed with a FastAPI backend, a MySQL database, and a React interface for scalable deployment. A small usability assessment suggested that the explanations and UI were clear and usable. Overall, ParkiDxAI aims to bridge the gap between model performance and clinical usability and can be adapted to other tabular clinical tasks.

Huynh-Dai-Nhan Tran, Minna Isomursu, Manh-Hung Trinh, Tan-Nguyen Ngo, Gia-Hau Le, Hoang-Anh Pham
Grad-CAM-Driven Explainable Deep Learning Framework for Cervical Cancer Image Classification

Many existing Deep Learning (DL)-based approaches for cervical cancer diagnosis lack comprehensive architectural comparisons and fail to effectively integrate Explainable AI (XAI) methods, such as Grad-CAM. This study proposes a cervical cancer classification framework that utilizes several state-of-the-art DL models and employs Grad-CAM to generate heatmaps highlighting the specific image regions that influenced the model’s predictions. These visual explanations improve transparency and support clinical interpretation, thereby enhancing trust in AI-assisted diagnostic systems. The experimental results demonstrate that the proposed framework not only achieves high classification accuracy but also provides valuable visual insights, contributing to the development of more interpretable and reliable AI tools for medical image diagnosis.

Huynh-Dai-Nhan Tran, Tan-Phuoc Pham, Manh-Hung Trinh, Tan-Nguyen Ngo, Tan-Hung Nguyen, Hoang-Anh Pham
Hybrid Genetic Algorithm with Caputo Fractional Derivative for Ambulance Routing Problems

The ambulance routing problem plays a critical role in emergency medical services, where timely ambulance response can significantly impact patient outcomes. However, urban traffic congestion and dynamic routing conditions cause challenges to existing optimization approaches. This study proposes CaputoGA, a novel hybrid metaheuristic that integrates the Caputo fractional derivative into the genetic algorithm framework. By introducing a memory-aware penalty term and adaptive mutation rate based on this fractional derivative, CaputoGA enhances convergence behavior and routing stability. The algorithm is further enhanced with a hybrid local search mechanism combining 2-opt and swap heuristics to refine best routes. Experimental evaluations on benchmark datasets adapted from the vehicle routing problem demonstrate that CaputoGA outperforms standard GA and two state-of-the-art hybrid algorithms, such as K-means simulated annealing tabu search and priority-based adaptive particle swarm optimization, in both route cost and robustness. Experimental results show that CaputoGA substantially reduces the percentage difference relative to the known cost and can achieve the lowest percentage difference at 3.98%. CaputoGA also requires shorter execution time in completing best routes calculation compared to GA. These results validate the efficacy of implementing memory-driven dynamics into evolutionary search. This work highlights the potential of fractional calculus in optimizing complex routing problems and provides a foundation for future extensions into other metaheuristic approaches.

Jackel Vui Lung Chew
Formal Analysis of Ethical Autonomous Systems Under Uncertainty

Autonomous systems have become an important fixture in modern day life. Certification of ethical decision-making in autonomous system is necessary to minimize harm. In this work, a formalism of ethical decision making under uncertainty is described. The ethical principles are incorporated using Deontic logic rules of obligations, forbidden actions, and permissible actions. The uncertainty in the environment is modeled using interval discrete-time Markov chain. A tractable probabilistic model checking on interval discrete time Markov chain model is constructed for evaluation ethical decision making in a system. The results from experiments are performed using probabilistic model checking on a tractable model of interval discrete time Markov chain and statistical inference is conducted on a prototype of a moving aircraft is presented.

Joanna Godawa, Krishnendu Ghosh
Opposite Color Multiscale Local Binary Pattern Features for the Prediction of Bread Edibility

Bread is one of the profoundly consumed staple bakery foods by many people in the world. Quality is a remarkable concern as it is a consumable food product. It depends on the raw ingredients and baking process involved during the preparation. After the purchase of the bread, the quality and in turn the shelf life period of the bread may likely to get affected by the storage method. Hence, the edibility of the bread needs to be estimated. Most of the studies do estimate this using sensory attribute measurements like strange odor, crust color, taste, aroma, hard texture and mold formation. On contrary, this study newly attempts to examine the edibility effortlessly through images. A new variant of texture based Local Binary Pattern features is proposed for the prediction of edibility through analysis of hard texture and mold formation. As there is no benchmark bread sample dataset available for the study, a new dataset of 18,513 images is created. It is observed from the experimentation that the proposed Opposite Color Multiscale Local Binary Pattern features provide good estimation on majority voting with reduced number of features through feature transformation and selection. The accuracy obtained is 0.8493 which is comparable with other common variants of local binary pattern features. Multiple classifiers are evaluated during experimental analysis and ensemble approach outperforms well. As this is a contemporary problem addressed in the domain based on images of bread being first of its kind, it is likely to open up new challenges to be undertaken.

Kavitha Rajamani, Guru Devanur S
Saliency Retargeting Considering Aesthetics Quality Based on Deep Learning

This paper proposed two deep learning approaches for saliency retargeting while preserving aesthetic quality in images, addressing the limitations of conventional methods that focus solely on maximizing object saliency. The first approach, “OperatorNet”, estimates the intensity of multiple image editing operators to enhance the main object, utilizing EfficientNet-lite3 and multitask learning. The second approach, “GeneratorNet”, directly transforms images at the pixel level using the U-Net architecture. Both quantitative evaluations and subjective human assessments were conducted on images generated by the proposed methods. Experiments using the MS-COCO dataset confirmed that the proposed approaches achieved superior saliency retargeting while considering aesthetic quality compared to existing methods.

Kazuki Koike, Ryuichi Egoshi, Hironori Takimoto
MBAAF: Multi-branch Lightweight Architecture for Audio Spoofing Detection with Temporal Gating and CBAM-Based Attention Fusion

With the rapid advancement of AI-driven speech synthesis and voice conversion technologies, deepfake audio has emerged as a serious threat to communication integrity and cybersecurity. In this paper, we propose a lightweight hybrid attention architecture named MBAAF for audio spoofing detection, which integrates feature-specific attention modules across a multi-branch input pipeline. Specifically, Mel-Frequency Cepstral Coefficients (MFCC) and Short-Time Fourier Transform (STFT) branches are refined using Temporal Gating Blocks (TGB), while Constant-Q Cepstral Coefficients (CQCC) are enhanced with a Convolutional Block Attention Module (CBAM). These refined features are fused and processed by a compact ResNeSt backbone for final classification. Experiments show that MBAAF attains 0.15% EER and 0.011 t-DCF on ASVSpoof2019 LA and sets a new state-of-the-art of 0.092% EER on the In-the-Wild dataset, using only 135K parameters. These results highlight the effectiveness of heterogeneous attention placement and confirm that a compact, modular design can achieve both high accuracy and efficiency for real-time, low-resource deployment.

Khanh-Duy Cao-Phan, Thi Phuc Dang
ConvSelect-RAG: Bridging Query Enhancement and Document Filtering for Multi-turn Conversational AI

Multi-turn conversational AI systems using Retrieval Augmented Generation (RAG) often struggle with ambiguous user queries and inefficient retrieval from large document collections. Traditional chunk-level vector search introduces significant computational overhead, while missing critical context due to reliance on raw, under-specified queries. We propose ConvSelect-RAG, a novel three-stage framework that (1) enhances queries using conversation history, (2) pre-filters documents using metadata summaries before chunk-level retrieval, and (3) integrates both for context-aware response generation. Experiments on large conversational QA benchmarks demonstrate that ConvSelect-RAG reduces retrieval latency by 23.5%, improves response accuracy by 18.7%, and decreases overall computational overhead by 31.2% compared to existing RAG baselines. Our approach offers superior scalability for real-world applications by minimizing unnecessary searches and preserving contextual relevance, setting a new standard for efficient, accurate multi-turn conversational AI.

Khoa Tran Dang, Quan Thi Khac, Dang Le Binh, Duy Tran Ngoc Bao
A Comprehensive Solution to Early Fault Detection in Continuous Integration Servers

Continuous Integration (CI) servers are widely-used and essential for automated software testing and deployment nowadays. To serve customer’s services smoothly and reliably, CI servers are expected to be fault-tolerant, leading to a critical need of their reliability and thus availability. On the other hand, they are complicated with many metrics that need to be monitored in real time from many different perspectives. Addressing this issue, several existing works have taken into account early fault detection in CI servers. However, it remains unsolved due to their proposed incomplete solutions and the challenges from the complex, dynamic nature of server metrics, prompting the development of our work. In this paper, we propose a comprehensive solution to early fault detection in CI servers. Our solution is based on machine learning and statistical methods for metric value prediction and then fault detection. For the first part, we define a new hybrid model by integrating AutoRegressive Integrated Moving Average (ARIMA)’s linear modeling with Random Forest’s capability to capture non-linear interactions. The novelty of our model is reflected by the stacking mechanism which effectively enhances its model components and further makes the model yield better prediction results than the others. For the latter, a combined statistical method is defined to accurately identify future faults associated with the server metrics. As a result, our solution is more effective while conserving computational resources for CI servers as evaluated on the real-world datasets including the metric values from CI server nodes across multiple global sites.

Khuong Nguyen, Chau Vo
Balancing Accuracy and Latency in Privacy-Preserving User Behavior Classification with Tree-Based Models

Fully Homomorphic Encryption (FHE) enables machine learning models to operate directly on encrypted data, ensuring privacy-preserving analytics for sensitive user behavior information. However, the computational overhead of FHE raises concerns about efficiency in real-time applications. In this study, we evaluate three representative tree-based classifiers on encrypted user behavior data. The experiments are conducted under different computation depths and quantization bit-widths to examine their influence on accuracy and inference latency. Our results show that while all three models can be executed within the FHE framework, XGBoost consistently outperforms the others, achieving superior predictive accuracy with inference times suitable for near real-time analysis. These findings indicate that XGBoost offers the most effective balance between accuracy and efficiency for privacy-preserving user behavior classification.

Kiet Nguyen Tuan, Vo Minh Tri, Nguyen Duc Thai
Physically-Grounded Goal Imagination: Physics-Informed Variational Autoencoder for Self-supervised Reinforcement Learning

Self-supervised goal-conditioned reinforcement learning enables robots to autonomously acquire diverse skills without human supervision. However, a central challenge is the goal setting problem: robots must propose feasible and diverse goals that are achievable in their current environment. Existing methods like RIG (Visual Reinforcement Learning with Imagined Goals) use variational autoencoder (VAE) to generate goals in a learned latent space but have the limitation of producing physically implausible goals that hinder learning efficiency. We propose Physics-Informed RIG (PI-RIG), which integrates physical constraints directly into the VAE training process through a novel Enhanced Physics-Informed Variational Autoencoder (Enhanced $$p^3$$ p 3 -VAE), enabling the generation of physically consistent and achievable goals. Our key innovation is the explicit separation of the latent space into physics variables governing object dynamics and environmental factors capturing visual appearance, while enforcing physical consistency through differential equation constraints and conservation laws. This enables the generation of physically consistent and achievable goals that respect fundamental physical principles such as object permanence, collision constraints, and dynamic feasibility. Through extensive experiments, we demonstrate that this physics-informed goal generation significantly improves the quality of proposed goals, leading to more effective exploration and better skill acquisition in visual robotic manipulation tasks including reaching, pushing, and pick-and-place scenarios.

Lan Thi Ha Nguyen, Kien Ton Manh, Anh Do Duc, Nam Pham Hai
A Fusion Model for Precipitation Nowcasting from Radar and Satellite Images

Precipitation nowcasting is a crucial task with applications across various industries and services. Recent advancements in deep learning have enabled the use of radar imagery for precipitation nowcasting. This work proposes a fusion model that leverages both radar and satellite imagery to enhance precipitation nowcasting accuracy. Specifically, we employ a latent diffusion model called LDCast for local radar images and a large pretrained model called ClimaX for global satellite data. These models serve as baseline and foundation models, extracting temporal features. These features are then integrated within a U-Net architecture for final forecasting. We construct a local radar dataset, the Nha-Be Radar Dataset (NRD-1), and conduct experiments using NRD-1 as local data and ERA5 as global data. Despite being trained on a relatively small dataset, our model outperforms the baseline LDCast model. We further compare our model to a state-of-the-art precipitation nowcasting method to demonstrate its effectiveness.

Le Hong Trang, Bui Khanh Vinh, Phan Thanh An, Pham Tran Vu
Backmatter
Titel
Multi-disciplinary Trends in Artificial Intelligence
Herausgegeben von
Thanh Tho Quan
Chattrakul Sombattheera
Hoang-Anh Pham
Ngoc Thinh Tran
Copyright-Jahr
2026
Verlag
Springer Nature Singapore
Electronic ISBN
978-981-9549-57-3
Print ISBN
978-981-9549-56-6
DOI
https://doi.org/10.1007/978-981-95-4957-3

Die PDF-Dateien dieses Buches wurden gemäß dem PDF/UA-1-Standard erstellt, um die Barrierefreiheit zu verbessern. Dazu gehören Bildschirmlesegeräte, beschriebene nicht-textuelle Inhalte (Bilder, Grafiken), Lesezeichen für eine einfache Navigation, tastaturfreundliche Links und Formulare sowie durchsuchbarer und auswählbarer Text. Wir sind uns der Bedeutung von Barrierefreiheit bewusst und freuen uns über Anfragen zur Barrierefreiheit unserer Produkte. Bei Fragen oder Bedarf an Barrierefreiheit kontaktieren Sie uns bitte unter accessibilitysupport@springernature.com.

    Bildnachweise
    AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, NTT Data/© NTT Data, Wildix/© Wildix, arvato Systems GmbH/© arvato Systems GmbH, Ninox Software GmbH/© Ninox Software GmbH, Nagarro GmbH/© Nagarro GmbH, GWS mbH/© GWS mbH, CELONIS Labs GmbH, USU GmbH/© USU GmbH, G Data CyberDefense/© G Data CyberDefense, Vendosoft/© Vendosoft, Kumavision/© Kumavision, Noriis Network AG/© Noriis Network AG, WSW Software GmbH/© WSW Software GmbH, tts GmbH/© tts GmbH, Asseco Solutions AG/© Asseco Solutions AG, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, Ferrari electronic AG/© Ferrari electronic AG