Multi-disciplinary Trends in Artificial Intelligence
18th International Conference, MIWAI 2025, Ho Chi Minh City, Vietnam, December 3–5, 2025, Proceedings, Part I
- 2026
- Book
- Editors
- Thanh Tho Quan
- Chattrakul Sombattheera
- Hoang-Anh Pham
- Ngoc Thinh Tran
- Book Series
- Lecture Notes in Computer Science
- Publisher
- Springer Nature Singapore
About this book
This 3-volume set constitutes the proceedings of 18th International Conference on Multi-disciplinary Trends in Artificial Intelligence, MIWAI 2025, held in Ho Chi Minh City, Vietnam, during December 3–5, 2025.
The 110 full papers presented in these proceedings were carefully reviewed and selected from 306 submissionsThe papers focus on various topics in AI and its applications, such as deep learning, machine learning, computer vision, pattern recognition, and natural language processing.
Table of Contents
-
Frontmatter
-
Topological Activation Maps for Visual Representation Learning from Tabular Data
M. Achutha, Bhaskarjyoti DasAbstractThe transformation of non-image data into visual representations suitable for deep learning remains a challenging frontier in machine learning. Many existing methods map tabular data to images via global feature-space layouts, which can reduce instance-specific nuance. We propose Topological Activation Maps (TAMs), a new framework that couples global feature topology with per-instance activations for more faithful and interpretable data transformations. TAMs use a two-phase embedding: (i) kernel projection and Self-Organizing Map training establish a prototype grid capturing global feature relationships; (ii) each sample generates a unique activation map by interacting with this grid through Gaussian-weighted distances. This design preserves both dataset-level structure and sample-specific characteristics. On benchmark classification datasets, TAMs deliver competitive or superior accuracy compared with strong ensemble baselines while offering a complementary perspective on interpretability. -
Robust Leaf Disease Classification via Deep Feature Concatenation and EfficientNetV
Ai My Thi Nguyen, Hoang Huy Le, Vinh Dinh NguyenAbstractPlants play a vital role in global food security, ecological balance, and economic sustainability. However, their susceptibility to a wide range of leaf diseases poses a major threat to crop yield and quality. Early and accurate identification of these diseases is essential for effective management. Manual inspection methods, though widely used, are inefficient, subjective, and unsuitable for large-scale agricultural operations. To address this challenge, we propose a deep learning-based classification system that significantly improves disease recognition performance. The model is designed to classify leaf images from six plant species—including both healthy and diseased samples—across 14 distinct categories. Our architecture leverages EfficientNetV2B0 as the feature extractor, enhanced with custom dense layers and feature concatenation to capture subtle visual cues. Through preprocessing and balanced dataset splitting, combined with training techniques such as dropout, adaptive learning rate scheduling, and early stopping, the model achieves robust generalization. Experimental evaluation shows that our approach improves classification accuracy from a baseline of 90.18% to 92.86%. -
Shallow Transformers with Applications Towards Image and Text Classification
Akshay Badola, Vineet Padmanabhan, Rajendra Prasad Lal, Wilson NaikAbstractThe Transformer architecture has quickly become extremely popular as it has achieved state-of-the-art performance in a variety of tasks with a relatively simple design of repeating blocks. Variants of Transformers are now staples in many classification tasks, including language modeling, image classification, and even object detection. The core aspect of the architecture, that is the sequential repeating blocks, has however remained unchanged throughout this time. In this work, we explore an alternative horizontally growing architecture that achieves similar results on the common tasks in which Transformers are proficient, while providing more controllability for parameter expansion due to the model’s shallow nature. We compare with two standard models: BERT for natural language processing and ViT in computer vision. We show that our model achieves comparable results while maintaining very low depth, and in some cases, with just a single layer. To the best of our knowledge, this is the first study that demonstrates the possibility and efficacy of such models. We provide results on some standard benchmarks, i.e., MNLI in case of Natural Language Inference, and CIFAR-10, CIFAR-100 for image classification. We also provide all the source code for our experiments. (The code can be found at https://github.com/akshaybadola/shallow-transformers. -
Computational Analysis of Cyber Risk Management by AI Coaching of Network DDoS Attack Activity in European Industries
Aleksander Szymczak, Jan Treur, Peter H. M. P. RoelofsmaAbstractThis project investigates cyber risk management during Distributed Denial of Service (DDoS) attacks in European industries that impact many critical sectors such as finance, telecommunications, and technology. Using adaptive network modeling and MATLAB, a model was developed to simulate the interaction between human decision-making and AI coaching under stress. The AI Coach, equipped with complete situational awareness, monitors employee behavior and intervenes when errors are detected. Through What-If analysis and risk assessment, it was discovered that factors such as detection speed and error influence financial loss outcomes. The simulation results demonstrate that AI-supported intervention can significantly reduce system vulnerability by correcting human misjudgments in real-time. -
Learning to Defer with Scoring Functions
Andrew PonomarevAbstractMachine learning models in high-stakes applications often collaborate with human experts, requiring intelligent deferral mechanisms. While traditional rejection learning uses model uncertainty to decide deferrals, it ignores human error and lacks flexibility in coverage control. Learning-to-defer approaches address this by jointly optimizing a classifier and a rejector, using loss functions that account for both model and human performance. However, existing methods either disregard human resource constraints or require costly retraining when coverage needs change. We explore a scoring-based approach to learning to defer that addresses these limitations. Building on prior heuristic methods, we introduce a novel metric quantifying how effectively a deferral system leverages both model and human expertise across the feature space. We then derive the theoretically optimal deferral rule and develop a practical scoring function approximation that enables post-hoc coverage adjustment without retraining. Our method trains a scoring function to rank samples by their expected delegation benefit, which can be calibrated to meet dynamic coverage constraints. Experiments show this approach achieves superior accuracy-workload trade-offs compared to existing methods, providing both theoretical grounding and practical flexibility for human-AI collaboration. -
Prediction of Pregnancy-Related Adverse Drug Reactions from Chemical Conformers Using a Fractional-Pooling Dilated CNN
Anushka Chaurasia, Deepak Kumar, Yogita YogitaAbstractAdverse drug reactions (ADRs) during pregnancy represent a critical concern, as they can adversely affect both maternal and fetal health. However, the availability of clinical evidence on drug safety in this population has remained limited, primarily due to the ethical restrictions associated with conducting controlled trials in pregnant women. A range of computational approaches has been proposed to address this gap. Nonetheless, the majority of these methods have relied on one-dimensional or two-dimensional molecular descriptors, thereby neglecting the richer structural information contained within chemical conformers. In this work, we propose FracPool-DCNN, a novel deep learning architecture that integrates dilated convolutions with fractional max pooling to predict pregnancy-related ADRs directly from conformer images. Using a curated dataset of drugs from PubChem conformers and ADReCS-based annotations, the model has been trained and evaluated with five-fold cross-validation. FracPool-DCNN has achieved superior performance compared to nine baseline models, with a harmonic mean of 76.89%, AUPR of 77.42%, and ROC-AUC of 79.89%, while ablation studies confirm the critical contributions of fractional pooling and global average pooling. These findings highlight the promise of conformer-based deep learning for robust pregnancy drug-safety classification, offering a scalable approach to preclinical risk assessment. -
Compositional Distributional Semantics and the Conjunction Effect in Language Models
Yoshihiro Maruyama, Arisa YasudaAbstractCompositional distributional semantics in natural language processing allows us to compose meaning vectors for complex phrases from those for word vectors. Here we apply the framework of compositional distributional semantics to construct meaning vectors for bank teller and feminist bank teller in the well-known Linda problem, and compare them via cosine similarity to a meaning vector encoding the Linda description provided in the problem. In a noun vector space, the meaning vector for the conjoined phrase feminist bank teller is substantially closer to the meaning vector for the Linda description than the meaning vector for the occupation-only phrase bank teller; crucially, this effect requires the full adjective-as-matrix treatment provided by compositional distributional semantics. In this way, we can provide a computational linguistic account of the conjunction effect in the Linda problem. -
Quantum Patches for Efficient Learning
Ban Q. Tran, Chuong K. Luong, Susan MengelAbstractThe problem of conserving computational resources while training models in deep learning has become urgent as models increasingly require more input data to improve accuracy. To enhance accuracy beyond increasing data, researchers have leveraged quantum properties, such as using random quantum circuits (RQC) to transform input data, similar to how data augmentation techniques are applied, and have demonstrated their effectiveness. However, the appearance of RQC introduces new challenges, as quantum circuits transforming features consume significant computational resources and increase the algorithm’s time complexity. In this study, we propose a novel framework combining RQC and saliency map techniques to address the computational resource problem in quantum deep learning. The research results show that our framework has reduced the algorithm’s computation time by 2.34 times, achieving an accuracy of 92.51%, 2.2% higher than the quantum baseline version. This reduction is highly significant in the current era, where quantum computers are in the noisy intermediate-scale quantum (NISQ) era, and noise significantly impacts the accuracy of hardware computation outputs, making quantum hardware accesses relatively costly. -
Approximated Outlier Selection and Visualization with Approximated t-SNE
Dharamsotu Bheekya, Salman Abdul Moiz, C. Raghavendra RaoAbstractWorking with high-dimensional data is inherently difficult due to its sparse distribution and structural complexity, which often obscure underlying patterns and reduce the effectiveness of visual analysis. Although technique like t-Distributed Stochastic Neighbor Embedding (t-SNE) is commonly used to map such data into lower dimensions while preserving local structures, their heavy computational demands—especially during initialization and when computing pairwise similarities—limit their use in applications that require fast or interactive performance. This paper presents a refined framework based on Approximated t-SNE (A-tSNE) to address these issues. The more efficient variant of A-tSNE employs approximate k-nearest neighbor (k-NN) searches. The refined A-tSNE is combined with sparsified graph construction based on estimates of local intrinsic dimensionality and a dynamic mechanism for detecting outliers. The Approximated Outlier Selection Factor (AOSF) is a key component of the method, which allows anomalous points to be identified and filtered out before generating the visual representation. Experimental validation on the MNIST and CIFAR-100 datasets reveals that this approach produces more precise and more informative visualizations by sharpening class boundaries, improving cluster separation, and preserving neighborhood consistency. These enhancements are further supported by quantitative assessments using trustworthiness and accuracy scores. The proposed method delivers a scalable, interactive, and outlier-aware visualization strategy that effectively balances computational efficiency with robust anomaly handling in high-dimensional data analysis. -
Spread-Learned Spatial Features to Improve Tick-Shape Networks
Canh Ngoc Hoang, Thanh Phuong Nguyen, Hoang Anh Pham, Thinh Vinh Le, Thi-The Phan, Thanh Tuan NguyenAbstractThe rapid growth of mobile devices and embedded systems requires lightweight networks with high performance in reasonable complexity. Recently, TickNets have been proposed to meet that requirement via connecting several tick-shape backbones. However, their performance is still at modest levels due to the lack of spatial features exploited from a basic tick-shape backbone. To mitigate this problem, an efficient perceptron is proposed to take into account spread-learned spatial features for improving the learning ability of tick-shape networks. Accordingly, this spread-learned feature extractor is simply done by adding a Full Residual Point-Depth-Point (FR-PDP) block to the beginning of a basic backbone. Such strategy will ensure two practical benefits for the tick-shape networks: i) Exploiting the identical FR-PDP-based features in a tick-shape backbone; and ii) Extracting more discriminative spatial features for the learning process. Finally, STickNets are formed by simply connecting several spread-learned tick-shape backbones. Experimental results on various benchmark datasets have indicated that our proposal has significantly boosted the performance of tick-shape networks. In particular, STickNet-basic is enhanced by \(\sim \)3.5% on CIFAR-100, up to 7.4% on Stanford Dogs. The implementation code is available at https://github.com/ngochc/STicknets. -
An Overview of the Effectiveness of Graph Learning Methods for Traffic Demand Forecasting
Luong-Chi Trung, Chung-Thai Kiet, Nguyen-Huu An, Dung-Cam QuangAbstractTraffic demand forecasting plays a crucial role in intelligent transportation systems and is a fundamental aspect of smart cities. The spatial-temporal nature of this task poses significant challenges for forecasting models, especially when it comes to extracting spatial features from complex graphs. To effectively capture these intricate spatial patterns, previous studies have explored a variety of methods for constructing a graph from spatial data. In this study, we present a thorough survey and taxonomy of existing research based on various graph construction methods, including static, adaptive, and dynamic approaches. To thoroughly evaluate these models and methodologies, we conduct experiments on seven real-world datasets. Among these, two are widely recognized benchmarks, while the other five have been newly collected and processed by us from government open data platforms. Our findings enable us to analyze and compare the strengths and limitations of various approaches. We also identify emerging trends and assess the current effectiveness of these methods. Finally, we propose potential research directions and opportunities for future work in this field. -
Decomposition-Based Optimization of Multi-camera Networks for Coverage Maximization Problem
Dang Phuoc Vinh Hung, Nguyen Cao Dat, Tran Van Hoai, Nguyen Huu HieuAbstractThe proliferation of low-cost, high-efficiency surveillance cameras has made multi-camera networks a ubiquitous tool for security and monitoring in public spaces. However, the optimal placement of these cameras to maximize coverage while minimizing resources presents a significant and computationally complex challenge, known to be NP-hard. This research introduces a novel and effective model to address the camera planning problem. We formulate the problem and propose a decomposition strategy that partitions the main problem into smaller, manageable subproblems based on a neighbor graph of potential camera locations. This decomposition allows for the efficient application of well-established solvers to find optimal or near-optimal solutions within practical time constraints. Our model utilizes a 2D map visualization of sensor views, which simplifies the calculation of coverage areas and ensures the direct applicability of our solutions to real-world scenarios. Experimental results, conducted on a set of candidate locations within a university campus, demonstrate the proposed model’s superior performance. The formulation successfully maximizes spatial coverage under limited computational budgets, validating its effectiveness and potential for deployment in practical surveillance network design. This approach offers a significant advancement in strategic camera placement, providing a scalable and efficient solution to a critical security problem. -
Bridging Language and Vision: Fine-Tuning Latent Diffusion Models for Robust Text-to-Image Generation
Daniel Vadranapu, Abhiram Yadav Myla, Charan Ramtej KodiAbstractThis paper presents a progressive approach to text-to-image generation by developing an AI system capable of producing vivid and semantically faithful images from natural language descriptions. We adopt a multi-stage pipeline beginning with foundational experiments using Generative Adversarial Networks (GANs) on the MNIST dataset, gradually scaling to more complex datasets such as COCO. To bridge the semantic gap between textual and visual modalities, we integrate a BERT-based text encoder with a Convolutional Neural Network (CNN)-based generator, enabling the system to capture nuanced textual features and translate them into coherent visual representations. Building on these foundations, we leverage the Latent Diffusion Model (LDM) to enhance image quality and fidelity. Our contributions include model optimization, fine-tuning of LDM components, and a detailed evaluation of robustness across various textual inputs. Notably, we identify failure cases involving typographical and contextual input variations, highlighting key limitations in current diffusion-based models. These findings inform our recommendations for future improvements in semantic alignment and input resilience. Overall, our work demonstrates the effectiveness of combining transformer-based text encoders with generative architectures and sets the stage for more robust, high-fidelity text-to-image synthesis systems. -
Principal Directions-Based Data Classification Optimized by Genetic Algorithms and K-Nearest Neighbors
Doru Constantin, Costel BălcăuAbstractIn this paper, we propose an optimized version of the Principal Directions algorithm by integrating Genetic Algorithms (GAs) and the k-Nearest Neighbors (KNN) technique, with applications in image class classification. The algorithm enhances the step of quantifying the disturbance introduced to the principal directions when new images are added to existing image classes. The optimal number of principal directions required for effective classification is determined using a genetic algorithm, while an improved version of the KNN algorithm is employed to optimize the key parameter - the number of nearest neighbors - used during testing. For validation and testing, we apply the proposed method to the classification of face image classes corresponding to multiple individuals. Several variants of the algorithm are developed, differing in the similarity measure used within the KNN algorithm and the type of crossover operator employed during binary gene recombination (ex. one-point, two-point, or multi-point crossover). Experimental results demonstrate that the proposed approach achieves high accuracy in recognizing new face images, validating its effectiveness. -
FedIncSparse: A Federated Learning Framework with Top-K Sparse Delta Transmission and Incremental Updates
Duy-Dong Le, Duy-Thanh Huynh, Tuong-Nguyen HuynhAbstractWhile federated learning (FL) enables privacy-preserving model training across decentralized devices, its performance is often hampered by non-independent and identically distributed (non-i.i.d.) data and high communication overhead. This paper introduces FedIncSparse*, a new FL framework that addresses these dual challenges through top-K percent sparse delta updates. This approach combines two key advantages: delta transmission slashes communication volume by sending only parameter changes, while incremental updates prevent client drift by gradually refining the global model. We provide robust convergence guarantees for both convex and non-convex objectives. Evaluations on standard benchmarks (MNIST, Fashion-MNIST, CIFAR-10) show FedIncSparse achieves superior accuracy, faster convergence, and up to 35% lower communication costs than FedAvg and FedProx. Furthermore, it delivers performance comparable to FedZip but with reduced computational demands. (Code is available at: https://github.com/dongld-2020/FedIncSparse.) -
Metric-Weighted Voting Classifier with SMOTEENN for Enhanced Machine Learning Cardiovascular Disease Prediction
Emmanuel Ileberi, Yansia SunAbstractCardiovascular diseases (CVDs) remain a leading cause of global mortality, underscoring the need for reliable and interpretable predictive models to support early diagnosis and clinical decision-making. This study proposes a hybrid ensemble framework that integrates SMOTEENN resampling with a metric-weighted Voting Classifier to address class imbalance and improve prediction robustness. Using the publicly available Kaggle CVD dataset (\({\approx } 70,000\) records), the dataset was split into 80% training and 20% testing using stratified sampling and we evaluated a diverse pool of base classifiers including Decision Tree, Logistic Regression, K-Nearest Neighbour, Random Forest, LightGBM, XGBoost, and Multilayer Perceptron. Performance metrics such as Accuracy, Precision, Recall, F1-score, Matthews Correlation Coefficient (MCC), and ROC-AUC were employed to provide a comprehensive evaluation. Results demonstrate that the proposed SMOTEENN + metric-weighted ensemble significantly outperformed the baseline (non-resampled) and existing models, improving accuracy from 72.6% to 96.9%, F1-score from 0.72 to 0.97, and ROC-AUC from 0.73 to 0.97. Moreover, the metric-weighting scheme allowed aggregation to be guided by clinically meaningful criteria, enhancing sensitivity and balanced classification. -
Disentangled Latent Augmentation for Abnormality Detection in Musculoskeletal Radiographs
Thota Gokaramaiah, Korra Sathya Babu, K Nagaraju, Nenavath Srinivas NaikAbstractDeep learning has shown impressive results in computer vision, but it often underperforms in imbalanced datasets. This challenge is prominent in medical imaging, particularly in the classification of musculoskeletal disorders, where minority classes are underrepresented. This study presents a generative framework for the detection of musculoskeletal abnormalities with a specific focus on the problem of data imbalance in the classification of radiographic images. A disentanglement-driven approach is employed using \(\beta \)-variational autoencoder (\(\beta \)-VAE), which facilitates the generation of diverse and class-consistent samples through latent space manipulation. These synthetic samples are utilized within a triplet network for metric learning, enhancing discriminative representation by promoting greater inter-class separability and intra-class compactness in the latent space, thus effectively mitigating imbalance during classification. Experimental evaluation of the musculoskeletal radiograph (MURA) dataset, the proposed triplet network with \(\beta \)-VAE improves classification performance, achieving \(11.3\%\) higher accuracy and \(42.3\%\) greater Cohen’s kappa on the finger study type (DenseNet-169), and \(15.6\%\) accuracy gain with \(27.1\%\) Cohen’s kappa improvement on the forearm study type (ResNet-50), demonstrating its effectiveness for imbalance-aware diagnostic imaging.
- Title
- Multi-disciplinary Trends in Artificial Intelligence
- Editors
-
Thanh Tho Quan
Chattrakul Sombattheera
Hoang-Anh Pham
Ngoc Thinh Tran
- Copyright Year
- 2026
- Publisher
- Springer Nature Singapore
- Electronic ISBN
- 978-981-9549-57-3
- Print ISBN
- 978-981-9549-56-6
- DOI
- https://doi.org/10.1007/978-981-95-4957-3
PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.