Zum Inhalt

Computer Vision and Robotics

Proceedings of CVR 2025, Volume 2

  • 2026
  • Buch
insite
SUCHEN

Über dieses Buch

Dieses Buch besteht aus einer Sammlung von qualitativ hochwertigen Forschungsartikeln im Bereich Computer Vision und Robotik, die auf der Internationalen Konferenz für Computer Vision und Robotik (CVR 2025) präsentiert werden, die vom National Institute of Technology, Goa, Indien, vom 25. bis 26. April 2025 organisiert wird. Das Buch diskutiert Anwendungen von Computervision und Robotik in Bereichen wie Medizin, Verteidigung und intelligenter Stadtplanung. Das Buch stellt aktuelle Arbeiten von Forschern, Wissenschaftlern, der Industrie und politischen Entscheidungsträgern vor.

Inhaltsverzeichnis

Frontmatter
Automating Medical Report Summarization: A Generative AI Approach for Enhanced Decision Support and Workflow Efficiency in Healthcare

In today’s healthcare systems, managing the vast and growing volume of clinical text, particularly pathological reports, remains a pressing challenge. To address this, we introduce an automated summarization framework designed to distill essential information from lengthy medical documents. The proposed system combines a Transformer-based encoder-decoder architecture with a Generative Adversarial Network (GAN) to enhance the accuracy and fluency of generated summaries. Prior to modeling, the input text undergoes rule-based preprocessing and Named Entity Recognition (NER) to identify and retain critical medical terms while eliminating irrelevant data. The Transformer module effectively captures complex contextual relationships within the document, while the GAN discriminator improves the summary’s coherence through adversarial refinement. We evaluated our model on a clinical dataset using standard summarization metrics, including ROUGE-1, ROUGE-2, and ROUGE-L. Comparative analysis with existing models such as BERTSUM and TextRank indicates that our approach yields more relevant and concise summaries. This solution aims to support healthcare professionals by streamlining the review of clinical texts and facilitating faster decision-making.

Palak Hajare, Mallika Hariharan, Snehal V. Laddha
Generating Machine-Style Handwriting: A Diffusion Based Latent Generation with VAE Decoding

In this paper, we introduce the Style-Calligraphy model, an innovative architecture designed to generate high-fidelity images of text in specified machine styles, conditioned on a given text input. Our approach leverages the strengths of Variational Autoencoders (VAEs) and Latent Diffusion Models (LDMs) to address the challenges of latent space representation and efficient image generation. The VAE encoder-decoder framework is employed to learn structured latent spaces, mitigating the limitations of traditional autoencoders by incorporating Kullback-Leibler divergence alongside image reconstruction loss. This ensures a continuous and feasible latent space for sampling. The LDM is trained as a denoiser with text-based conditioning, utilizing a Markov chain to model the noise addition process and employing cross-attention mechanisms to enhance spatial character relationships. We introduce a novel sliding cross-attention technique using duplets and triplets to capture intricate dependencies between characters, significantly improving the model’s performance. Furthermore, we propose a stand-alone image decoder to address noise sensitivity, trained on both clean and noisy latent representations, resulting in a substantial increase in image quality. A key innovation of our work is the repurposing of a single LDM across multiple machine styles, drastically reducing training costs by isolating style-specific training to the image decoder. Our comprehensive training pipeline, optimized for efficiency, demonstrates the model’s capability to generate accurate and stylistically coherent text images, achieving a 99.5% success rate in high-quality sample generation on seen data.

Phani Kumar Nyshadham, Prasanna Biswas, Archie Mittal
A Comparative Study of Image Synthesis Models: Stack GANs and Diffusion Based Text to Image Generation

In today’s world, where visuals communicate more effectively than words, text to image synthesis plays a crucial role across various sectors, helping them grow with a creative, image centric approach. Machine learning has contributed significantly to this field through simple techniques, but deep learning has introduced robust models that can automatically generate realistic images from input text. Recent advancements in generative models have led to the development of several techniques for text to image synthesis. Although multiple models exist, our project primarily focuses on exploring two: the Stack GANs, which has maintained a strong position, and the recently emerged Stable Diffusion model. This study includes an exploration of the literature on text generative models, providing a deeper understanding of these models. Furthermore, the research extends to training different related models on various common datasets, such as LAION5B, CUB2002011 and Oxford 102 Flower evaluating both the accuracy and quality of the generated images. Finally, we present the findings and results of our study show that the diffusion model, with an accuracy of 76%, out performed the GANs Model, which had an accuracy of 56%, leaving room for future enhancements.

Tarushi Khattar, Sara Bare, Tanya, Sakshi Kuyate, Vaishali Wangikar
Optimized Humidity Prediction: A Random Forest and Aquila Optimizer Approach

Weather dynamics of Relative Humidity (RH) are notoriously nonlinear, there are outliers, and even the error distributions are asymmetric, all of which hinder the accurate prediction of RH in the conventional models. The limitations mentioned above are formulated in this study by introducing a random forest optimized with Aquila optimizer (RF-AO) as a novel hybrid machine learning model to mitigate these problems. The Aquila Optimizer improves generalization and Metropolitan noise robustness by adapting the RF hyperparameters such as tree depth, node splits, and ensemble size to meteorological data noise. The RF-AO model was run using daily RH data (2015–2018 from Pahalgam, India, provided by IMD) the RF-AO model reduced the Mean Absolute Error (MAE) to 0.1764 (vs. 8.8863 for standalone RF) and achieved a Willmott’s Index (WI) of 0.9901 d R2an of 0.9843 during testing. These improvements stem from the AO’s ability to balance exploration and exploitation during optimization, which mitigates overfitting and outlier sensitivity. The results demonstrate the model’s ability to be deployed for real-time applications in irrigation planning, HVAC control, and climate resilience strategy. Proposing a scalable framework for global climatic regions, this work integrates metaheuristic optimization into the ensemble forecasting for RH by making it more robust.

Sandeep Samantaray
Diabetic Retinopathy Classification using Transformer Models: An Comprehensive Survey

Diabetic Retinopathy (DR), a leading cause of blindness and visual impairment, arises from prolonged diabetes mellitus with poor glycemic control, leading to structural damage in the retina. DR is becoming a critical medical challenge, affecting individuals’ vision and overall health. While ophthalmologists can manually diagnose DR, this approach is labor-intensive and time-consuming, particularly in today’s high demand clinical environments. Early detection and prevention of DR require an automated, precise, and personalized approach using deep learning. Various deep learning techniques have been explored for DR severity classification, with Convolutional Neural Networks (CNNs) being the predominant choice. However, CNNs have limitations in capturing long-range dependencies within retinal images. Recently, transformers have gained prominence in computer vision, demonstrating superior performance in natural language processing. Transformers utilize multi-head self-attention mechanisms to model complex contextual interactions between image pixels, addressing the shortcomings of CNNs. This study proposes a transformer-based approach for DR classification, leveraging its self-attention mechanisms to enhance feature extraction and improve diagnostic accuracy. Fundus images are segmented into non overlapping patches, which are then flattened into sequences and processed through a linear projection and positional embedding technique to retain spatial information. These sequences are subsequently fed into multiple layers of transformer attention mechanisms to generate the final feature representation. In practical clinical applications, transformer-based models can provide ophthalmologists with rapid, precise, and individualized diagnostic insights, facilitating timely medical interventions and improving patient outcomes.

S. Suvalakshmi, B. Vinoth Kumar
VisionAid: A Real-Time System for Object Detection, Text Reading, and Voice Alerts for Visually Impaired Individuals

This manuscript introduces VisionAid, an innovative assistive system designed to enhance the independence of visually impaired individuals by facilitating navigation and environmental interaction. VisionAid integrates cutting-edge technologies, including real-time object detection, Optical Character Recognition (OCR), and Text-to-Speech (TTS), to provide dynamic audio feedback. We conducted extensive experiments using the COCO (Common Objects in Context) dataset, which contains thousands of real-world images, as a pretrained model. For object detection, VisionAid leverages YOLOv8 (You Only Look Once), a state-of-the-art deep learning model known for its high accuracy and low-latency performance. This enables the system to accurately detect and identify objects in real time, ensuring reliable feedback for the user. The system also incorporates Tesseract OCR for text recognition, allowing users to access printed or digital text seamlessly. The recognized text is then converted into natural speech using TTS technology, ensuring both visual and textual information are communicated effectively. By combining these capabilities, VisionAid offers an intuitive, accessible means for visually impaired users to interact with and understand their surroundings through auditory feedback.

Saanvi Sanjay, N. Shivani, Soham M Karia, Vaishnavi Mendon, B. V. Poornima
Computation of Fetal Heart Rate Variability from Abdominal ECG Using Adaptive Filtering and Independent Component Analysis

Investigating fetal electrocardiogram (fECG) is of critical importance for pregnant women to study about fetal health and its well-being. Generally, its extraction is preferred from the abdominal ECG (aECG) recordings, which consists of fECG, maternal ECG (mECG), and noises (such as, power line disturbances, motion artifact, uterine contraction, baseline wander, and high frequency noises, etc.). Accompanying noise in ECG causes loss of critical information and leads to misdiagnosis. The work presented in this paper extracts clean fECG from aECG, using the capability of independent component analysis (ICA) and adaptive filtering (AF). ICA is a blind source separation (BSS) technique, which is used for estimating multivariate data as a linear combination of statistically independent non-Gaussian signals (i.e., source signals). It is also a non-parametric technique and is independent of pattern averaging, making it an efficient algorithm for identification of atypical heartbeats in ECG signal. FastICA (FICA) is a fixed-point iterative algorithm that estimates the independent components (ICs) with maximum non-Gaussianity by minimizing the similarity between them. These ICs are subjected to adaptive filtering along with direct fECG as the reference signal for extracting clean fECG. This filtering helps in estimating the lost fECG signal during acquisition by canceling the background noises. In this work, an optimally converging least mean square (LMS) algorithm is used with proper selection of step size. The extracted fECG obtained from the filtering process is subjected to post-processing, by Savitzky-Golay filter, followed by a $$3^{rd}$$ 3 rd order band-pass filter, a derivative filter, and a P-point moving average filter for clear identification of R-peaks. From R-peak locations, heart rate variability (HRV) has been computed using the Pan-Tompkins algorithm to predict fetal heart abnormalities. This method is validated on the publicly available PhysioNet (ADFECG) database and obtained an F1-score of 92.68%. The estimated heart rate for the extracted fECG is found to be 78 bpm.

Sanghamitra Subhadarsini Dash, Ashish Biju Varghese, Malaya Kumar Nath
Evaluation of Novel In-Shoe Strain Gauge Device for Gait Analysis via Data Processing Techniques

Gait analysis is the assessment of walking patterns through the coordination and balance of muscles in the body. It is essential in the diagnosis of neurological disorders and monitoring of patient progress for rehabilitation. Conventional gait analysis is heavily reliant on force plates to measure the Ground Reaction Forces (GRF) that a person exerts. However, these systems are constrained by high costs, constant maintenance, and repeated foot strikes to ensure accurate data. This study establishes an alternative novel in-shoe device that utilizes strain gauges to measure the GRF of a person. It addresses the key limitations of force plates while maintaining the accuracy and precision of the measurements. The device integrates four 3D-printed strain gauge mounts, positioned within the sole of the shoe. This replicates the functionality of force plates by capturing real-time GRF data while walking. Adjustments to the strain gauge positioning allowed for optimized force distribution. The device demonstrates an accuracy of approximately 95%, which has been supported by quantitative metrics like high correlation coefficient and low error rates. Beyond its empirical accuracy, the participant’s comfort while using the shoe was a critical consideration while designing the device. The device’s portability, affordability, and non-invasive design make it an ideal alternative to traditional force plates, particularly for use in clinical studies, rehabilitation, and remote diagnosis of disorders.

Ayaan Shankta, Rejin Jacob, Reetu Jain
Pothole Detection Using YOLOv8 with an Integrated Notification System

Urban roads, especially in populated cities like Mumbai experience daily wear and tear and get damaged quickly, leading to the formation of potholes. Conventional methods like manual inspection and laser-based systems for the detection of potholes are labor intensive, time consuming and costly, making deep learning models an efficient anfad cost-effective solution for automating pothole detection and road maintenance. This research presents a pothole detection system using YOLOv8 (You Only Look Once Version 8), a deep learning model that performs well in real time object detection by balancing the speed and accuracy of detection. It improves upon its previous model versions such as YOLOv2, YOLOv3, YOLOv4, YOLOv5 and YOLOv7 where the earlier versions achieved a Mean Average Precision (mAP) of 85–90%, while later iterations such as YOLOv5 and YOLOv7 improved the mAP to 94%. In the pothole detection system, we have also integrated a notification system that sends an SMS (Short Message Service) alert with the Global Positioning System (GPS) Coordinates i.e. longitude and latitude of the pothole once detected; indicating its potential to contribute to road safety improvements and more stream-lined infrastructure maintenance processes. While the current system sends notifications to a personal contact number for demonstration purposes, the system is designed to create real-world impact by potentially sending the notification alert to a designated government portal contact number, enabling faster intervention and more efficient road maintenance. As a whole, the pothole detection system achieves a precision of 92.7% and a recall of 87.5%, ensuring that potholes are detected with minimal false positives.

Shanaya Karkhanis, Shreyash Nadgouda, Archana Lakhe
A Performance Analysis of RC Filter for the Application of Analog Devices

This study explores the characteristics and applications of RC (resistor-capacitor) low-pass and high-pass filters, focusing on their gain versus frequency response, advantages, limitations, and practical significance in electronic circuits. A detailed theoretical analysis is conducted to understand the role of these filters in signal processing, emphasizing their ability to attenuate specific frequency components and shape signal waveforms. The research also includes an experimental investigation, where an RC circuit is implemented using discrete components and analyzed with the ADALM1000 active learning module and PixelPulse2 software. The cutoff frequency and filtering behavior are examined through direct measurements and the results are compared with MATLAB-based simulations. The experimental results align closely with theoretical predictions, confirming the effectiveness of RC filters in noise reduction, frequency selection, and signal conditioning. By bridging theoretical concepts with practical validation, this study underscores the importance of RC filters in modern electronics. The insights gained from this research provide valuable guidelines for optimizing filter design in applications such as audio processing, communication systems, and embedded electronics.

Rajulapati Sudha, P. Ramesh, Rushitha Reddy Golamaru
Ensemble Simulation Model-Based Animal Intrusion Detection System

Human-animal collisions are becoming more common, and + there is a rising need for efficient ways to create systems that can promptly alert drivers to potential animal collisions. Street dogs, cats, cattle, pigs etc. are commonly seen in our streets and are reasons mainly for accidents. Peacocks were also reported to cause many major accidents for two-wheeler riders since it files at low altitude. Potholes and speed breakers that are difficult to notice also contribute to major accidents in roads. The intensity of these accidents were high especially during night times since it is very difficult to trace these animal crossing. Hence an effective model is required to quickly and accurately detect and alter the driver about animal crossing. In this paper we have created an effective ensemble simulation model that will detect animals using thermal images.

B. N. Lohith Kumar, N. Manish, N. V. Uma Reddy, S. Sreejith
Large Language Model Interface for Manipulator Control

A language model is now a prominent research topic in Artificial Intelligence (AI) that has been trained to comprehend humans’ mode of communication and converse back in same way. Large-language model (LLM) is an improved version with greater learning capacity to also absorb sophisticated language structure capabilities. Robots are being utilized in every domain for automating processes. A pivotal challenge is the necessity of technical guide to communicate with the robot. This study’s main objective is to integrate LLMs into manipulator control systems, i.e., to facilitate the input of human-language instructions, which are then seamlessly translated into precise robotic arm tasks. The study addresses challenges like interpreting vague inputs, inferring reference frames, and ensuring usability through simple queries without requiring technical expertise. The proposed method is implemented using LLM models and the Robot Operating System (ROS) and tested using multiple manipulators both in the Gazebo simulator and on real-time hardware. The model was tested with a series of prompts, and it achieved a success rate of 87.33%, highlighting the LLM’s effective understanding of human commands and the related performance of the robotic system.

N. Preeti, Hema Srivarshini Chilakala, A. A. Nippun Kumaar
Efficient Detection of Vehicles on Indian Roads: A Comparative Performance Analysis of YOLOv8, V9, and V10

An essential task in computer vision - object detection, involves identifying and locating objects within images and/or video frames and has seen significant advancements through models like YOLO (You Only Look Once). This study presents a comparative study of object detection models trained on a custom dataset consisting of auto-rickshaws and license plates. Utilizing the YOLO models - YOLOv8, YOLOv9, and YOLOv10, the study evaluates the performance of their various versions in recognizing and localizing objects. Each model was trained under identical conditions to ensure an unbiased comparison and the results were analyzed based on performance metrics such as precision, recall, mean average precision, and model complexity. The results highlight that the YOLOv10 models - the medium(YOLOv10m) and the balanced(YOLOv10b) achieve better results with the former achieving a mAP@50 value of 0.791 and a F1 score of 0.768 for auto-rickshaw detection. The latter achieved a mAP@50 value of 0.739 and a F1 score of 0.731 for license plate detection with fewer parameters(YOLOv10m - 16.49M and YOLOv10b - 20.45M) with respect to the other models considered. Based on this, the use of YOLOv10m and the YOLOv10b models for future research in object detection tasks is recommended.

Preet Kanwal, Anjan R. Prasad, Prasad B. Honnavalli
DL Based Approach for Assessing the Severity of DR from Retinal Fundus Images

Diabetic retinopathy (DR) is a severe consequence of diabetes and a major cause of vision impairment globally, particularly affecting individuals in their lifetime. Early detection and timely treatment can significantly prevent vision loss in many individuals with DR. Once DR symptoms are identified, the disease’s severity can be assessed to determine the most suitable course of treatment. This manuscript focuses on classifying DR from fundus images based on its severity level by using ResNet, MobileNet, GoogLeNet, and VGG16. These models are considered for analysis due to their proven effectiveness in image classification tasks, robustness in feature extraction, and efficiency in handling medical imaging datasets. ResNet’s deep residual connections provides detailed information, whereas MobileNet’s lightweight architecture optimizes speed, GoogLeNet’s inception module, and VGG16’s simple convolutional layers make them well-suited for DR classification. These models are trained by experimentally determined learning rate, optimizer, and loss function to achieve higher accuracy. These models have been tested using the APTOS 2019 dataset consisting of 5593 retinal images across 5 classes and obtained an overall accuracy of 95.89%.

Sidharth Jeyaraj, Malaya Kumar Nath
Music Recommendation System Based on Facial Emotion Recognition

Music recommender systems have become an important application of personalised technology aimed at tailoring content to users’ preferences. However, most past systems have relied almost exclusively on the users’ past interactions and similarity in content, rather than adjusting recommendations in real time based on inputs from the users’ end. This project introduces a facial recommendation system that uses a Convolutional Neural Network (CNN) to recognise the facial emotion of the user, thus creating a more immersive and contextually relevant experience. Following this, the system employs clustering and content-based recommendation methods to predict and recommend songs to the users based on their mood.

Kreesha Iyer, Neha Grandhi, Bhagyashree Birje, Priyanka Verma
Indian Sign Language Recognition Using CNN-LSTM Architecture for Enhanced Gesture Prediction

The precise development of automated recognition systems for Indian Sign Language (ISL) faces significant difficulties because ISL gestures demonstrate high variability together with complex patterns. Classic neural networks fail to grasp both the space-focused together with time-based properties of these gestures appropriately. Our proposed model uses Convolutional Neural Networks (CNN) to extract spatial features and a Long Short-Term Memory (LSTM) network with a percentage-based attention system to analyze temporal elements. The system analyzes frames through the CNN networks and employs Attention-LSTM temporal processing to achieve 99% accurate ISL gesture recognition on a complete dataset.This research presents a CNN-Percentage-Based-Attention-LSTM model structure which effectively retrieves gesture space and motor characteristics while achieving better accuracy levels as compared to traditional approaches. The attention mechanism embedded in the model helps it direct attention toward essential gesture features to enhance recognition accuracy on both complex and subtle gestural movements. Real-time ISL gesture recognition’s scalability and robust nature enable the model to operate as a promising communication aid tool for educational, social, and professional domains and hearing-impaired individuals. The obtained results show how this methodology can solve the present challenges of standard ISL recognition techniques while leading to new developments in this field.

Anshara Beigh, Smriti Kumari, Rebekah Russel, Ali Imam Abidi
A Modified Aggregation Operator and Score Function for Solving Multicriteria Decision Making Problem Under Neutrosophic Environment

Neutrosophic sets are characterized by membership, non-membership and indeterminacy functions that provide a robust method for modeling of the incomplete or inconsistent data, which is commonly encountered in real-world decision-making scenarios. This paper proposes a multicriteria decision-making (MCDM) approach for handling the uncertain and vague information in decision problems using the aggregation operators and score functions within the framework of Neutrosophic Sets (NS). The proposed approach combines the aggregation operators to fuse the multiple criteria and alternatives along with the score functions to rank and evaluate the best alternatives. The aggregation operators are designed to combine the Neutrosophic sets associated with each criterion into a single comprehensive evaluation while the score function helps in deriving a crisp ranking of alternatives. A realistic example is provided to illustrate the approach's efficacy, showcasing its applicability in handling the complex decision problems under uncertainty and imprecision. On the basis of scoring functions and criteria, the table compares several aggregation functions. Existing approaches from Sachin, Garg, and Nafei et al. are contrasted with the suggested score function. The values of 0.233 and 0.333 produced by the suggested scoring function are similar to those of Sachin and Garg but lower than those of Nafei et al.Three criteria ( $${C}_{1},{C}_{2},{C}_{3}$$ C 1 , C 2 , C 3 ) are included in the comparison for the aggregation functions for two options ( $${A}_{1},{A}_{2}$$ A 1 , A 2 ). The aggregated values for $$({A}_{1})$$ ( A 1 ) are 0.3268, 0.2000, and 0.3881 when using the current aggregation operator (Ye [9]); however, the suggested aggregation operator produces 0.1796, 0.1056, and 0.1634, suggesting a decrease in values. Likewise, for $${(A}_{2})$$ ( A 2 ) the suggested approach yields 0.5627, 0.1414, and 0.2000 using the current aggregation operator. These results suggest that this method offers a flexible and effective tool for decision makers in a situation involving incomplete or conflicting information.

Ritu, Tarun Kumar, M. K. Sharma
MedSynGAN: A Federated GAN System for Generating Synthetic Medical Images

MedSynGAN addresses the challenges posed by data scarcity, diversity, and privacy in medical imaging, mainly when generating synthetic chest X-ray images. The proposed system, Federated Generative Adversarial Network (MedSynGAN), relies on a decentralized approach wherein a model can be trained at multiple healthcare settings with patient data kept strictly private to those sites. One of the significant advantages of this solution lies in its ability to integrate federated learning with the advanced architectures of GAN, such as DCGAN and ProGAN, for enhanced quality and stability of images. The MedSynGAN system offers improved model robustness and several other advantages such as performing diverse dataset learning without compromising sensitive information. Performance is measured using metrics such as Fréchet Inception Distance (FID), with some of the results reported reaching scores as low as 26.71 for high-quality images with structural similarity indices reaching 0.85 and peak signal-to-noise ratios above 51.29 to ensure that fidelity is high with applicability in a clinical setting. The synthetic images generated by this system show promise for augmenting training datasets for various diagnostic tasks such as pneumonia detection or nodule classification, potentially addressing data imbalance issues in medical image analysis. This research has tremendous applications in real-world healthcare because it caters to medical research and AI training needs while maintaining the critical privacy of patients’ data.

Chinmay Inamdar, Arya Doshi, Swadha Joshi, Swati Shilaskar
Automated Detection of Defects in Solar Images Utilizing Integrated Deep Learning Frameworks

Renewable energies must meet rising power demand while protecting the environment. Solar farms are a fast-growing, ecologically friendly power source. Multiple solar flaws from ordinary operations or environmental circumstances reduce solar energy generation efficiency. Electroencephalogram (EL) imaging shows defects. Manual defect identification is time-consuming, expensive, and inaccurate. An automated deep-learning method for solar failure identification and categorization is presented in this paper. Traditional semi-automated machine-learning methods require manual feature extraction. The suggested solar defect detection and classification approach uses several EL images. The system comprises three phases: pre-processing, Convolutional Neural Network (CNN)-based segmentation and feature extraction, and LSTM for solar abnormality classification. The proposed model pre-processes training and tests solar images before autonomous deep learning feature extraction and categorization. Gaussian filtering and contrast adjustment are main distortion correction approaches during pre-processing. The CNN estimates more strong and reliable features with distortion correction, improving detection accuracy. The CNN layers are further enhanced by applying Discrete Cosine Transform (DCT) and Independent Component Analysis (ICA) are used to extract and reduce robust features in the CNN model. Finally, the classification is performed using Long Short-Term Memory (LSTM) classifier. Performance of the proposed model is compared with the existing methods using two datasets. Proposed model has improved accuracy by 4.37% compared to existing techniques.

Dhanashree Kulkarni, Preeti P. Kale, Hemant B. Mahajan, Priya Pise, Sulbha Yadav, Smita Desai
Multiclass Botnet Detection in IoT Smart Home Using Deep and Ensemble Learning Techniques

In the rapidly expanding domain of the Internet of Things (IoT), smart homes have become increasingly prevalent, integrating various interconnected devices that enhance convenience, security, and energy efficiency. Nevertheless, this interconnectedness also brings up notable security complexities, including the threat of botnet assaults, which have the potential to jeopardize entire networks of devices. With the increasing quantity and variety of interconnected devices in smart homes, it is imperative to implement strong security measures to safeguard these systems. This work aims to improve the security of IoT smart homes by assessing the efficacy of several deep learning algorithms in detecting and classifying multiclass botnet attacks. We employed the Bot-IoT dataset to execute and evaluate the effectiveness of three advanced deep learning models: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Recurrent Neural Network (RNN), and two Ensemble Learning (EL) methods, namely Gradient Boost (GB) and AdaBoost (AB). The primary objective was to determine which models offer the most reliable protection against sophisticated cyber threats targeting smart home environments. The experimental findings demonstrate that DL models consistently achieved superior performance compared to the other EL models in many performance parameters, such as accuracy, sensitivity, false positive rate (FPR), false negative rate (FNR), Matthews correlation coefficient (MCC), and Area Under the Curve (AUC). The results emphasize the capability of DL models to greatly enhance the security of IoT smart homes, offering a robust defense mechanism against the changing nature of botnet attacks. This study emphasizes the crucial importance of utilizing modern DL methods to protect the growingly interconnected and susceptible IoT ecosystems.

Haifa Ali Saeed Ali, J. Vakula Rani, Binay Budhathok
Evaluating the Performance of SVM and Random Forest in Air Quality Monitoring and Prediction

Air pollution is becoming a major threat in Indian cities and people are facing serious consequences due to air pollution. It affects the environment and the living standard of an individual. Major Pollutants are, particulate matter PM 10, PM 2.5, SO2, CO, NO2, O3, NH3, Pb, Ni, As, Benzo(a)pyrene and Benzene. Inhaling these hazardous substances leads to severe health issues. Predicting the air pollution of a particular area or city can help the government to take appropriate measures to reduce air pollution. Also, public can prevent themselves, if they receive alerts regarding pollution. Here, Random Forest Algorithm (RFA) and Support Vector Machine (SVM) algorithms are applied for predicting the pollutants such as PM 2.5, PM 10, CO, SO2. The performance is evaluated by measuring the Root Mean Square Error (RMSE). Prediction graphs show that the true value and the estimated value made by the models are close to each other which shows the accuracy of the algorithms. It was successfully demonstrated that error for SVM with non-linear kernal was 0.92 and for Random Forest algorithm the error was 0.91. The results prove that machine-learning algorithms can be utilized effectively to predict the AQI.

G. Arthy, M. Malathi, P. Sinthia, P. Nagarajan, N. Ashokkumar, Kavitha Thandapani
Hybrid Dehazing Method for Low-Bandwidth Satellite Images Based on Generative Adversarial Network

Remote sensing using satellite images plays a crucial role in gathering data about the Earth's surface, but haze degradation at low bandwidth can significantly compromise image clarity and utility. This paper introduces a hybrid dehazing method designed specifically for low bandwidth satellite images using Generative Adversarial Networks (GANs). The approach enhances the visual quality of hazy satellite imagery by integrating deep learning and advanced image processing techniques to counteract atmospheric effects while preserving essential image details necessary for remote sensing applications. Through experimental analysis, the proposed method exhibits its ability to produce clear, high-fidelity satellite images even under dense haze conditions. Compared with state-of-the-art methods such as DCP, Dehaze-Net, U-Net, and AIDED-Net, the hybrid approach shows a reduction in root mean square error of approximately 13% to 15% across different haze intensities and an average improvement of 34% in the structural similarity index measure. For dense hazy images, the method achieves a minimal AMSE value of 4.2, outperforming other techniques that fall in the range of 4.4 to 5.7. These results confirm the proposed method as a promising solution for enhancing satellite image quality across varying environmental conditions.

B. A. Sabarish, R. Aarthi, R. Dhamayandhi, Akshaya Sajith
Fibonacci Based Security Algorithm for ECG Signal

The rising dependence on telemedicine and wearable well-being devices has made the safe transmission of sensitive biomedical information, like Electrocardiogram (ECG) signals, a critical concern. In this paper, we propose a Fibonacci-based security calculation to guarantee the secrecy and integrity of ECG signals during transmission. The proposed algorithm leverages the unique properties of the Fibonacci succession to make a lightweight and effective encryption instrument custom-made to the continuous prerequisites of ECG information. By coordinating Fibonacci changes with conventional cryptographic strategies, the proposed strategy accomplishes upgraded security with insignificant computational above, making it reasonable for asset-obliged devices. The algorithm is evaluated regarding encryption strength, computational proficiency, and capacity to endure cryptographic attacks. Trial results exhibit that the Fibonacci-based approach gets ECG signals and preserves the nature of the first clinical information, guaranteeing both security and exactness in remote health monitoring systems with the Signal to Noise greater than 50 dB.

A. Electa Alice Jayarani, Helen K. Joy, S. Thenmozhi, V. Sangeetha, Bhargavi Ananth
Hybrid CNN-LSTM with Attention Mechanism for Medical Visual Question Answering

Medical Visual Question Answering (MedVQA) has become an important research field at the convergence of medical imaging and natural language processing for its potential to assist medical practitioners by providing precise answers to image-related questions. This paper introduces a hybrid framework combining CNN and LSTM with an attention mechanism for question answering over medical images. The framework processes medical images and interprets natural language questions to generate relevant answers. The multimodal architecture uses a Convolutional Neural Network (CNN) for extracting features from the image and a Long-Short-Term Memory (LSTM) network for handling the question and answer sequences. The Multimodal fusion, in the framework was performed using an attention mechanism. The model was trained and evaluated on two datasets: ImageCLEF VQA-MED 2019 and VQA-RAD 2019. On VQA-MED 2019 dataset, the model achieved training accuracy of 98%, validation accuracy of 61% and testing accuracy of 60% across 69 classes. For the VQA-RAD dataset, the model attained a training accuracy of 98%, a validation accuracy of 70%, and a testing accuracy of 74% across 8 classes.

Vandana Ratwani, Jitendra Bhatia, Jitali Patel
A Hybrid CNN-CapsNet Pipelined Approach For Disease Diagnostic With Severity Estimation

This research presents a novel and robust approach for the disease classification of leaves by integrating YOLOv9 with attention-enhanced CNNs such as VGG-19, EfficientNetV2B0, and ResNet101, along with a hybrid Capsule Network (CapsNet) built on a ResNet50 backbone with a lightweight attention mechanism. YOLOv9 is trained to detect and segment individual leaves from input images, each of which is then passed to the enhanced CNNs or the hybrid CapsNet for disease classification and severity estimation. The proposed hybrid approach emphasizes local hierarchical features while preserving base CNN-extracted features, representing a significant advancement over existing methods, as such a pipelined integration of YOLOv9 with CapsNets has not been considerably explored in prior literature. The enhanced CNN models achieved high validation accuracies, with VGG-19 reaching 98.69%, EfficientNetV2B0 achieving 99.68%, and ResNet101 scoring 99.55%, while the hybrid CapsNet outperformed all with an accuracy of 99.81%. The disease classification was conducted using the publicly available PlantVillage dataset consisting of 67,000 images, while a custom dataset for leaf detection was curated from Kaggle and Roboflow sources. The proposed method not only improves classification accuracy but also contributes to practical agricultural benefits by enabling early disease detection and severity analysis, potentially reducing pesticide usage and enhancing crop management.

Suhaib Aalam Bhat, Saliq Neyaz, Sidrat Shafiq Khan, Yash Paul, Rajesh Singh
Bird Species Recognition Using YOLOv8: A Deep Learning Approach for Habitat Conservation and Preservation

Accurate identification of bird species is essential for conservation and ecological studies. This study proposed a real-time bird detection system using an advanced deep learning model, YOLOv8, developed to perform well against ecological obstacles. The model was trained using a custom dataset, where the data was made and labeled through Roboflow, at a resolution of 640 × 640 pixels over 50 epochs. The model exhibited an inference speed of 2.9 milliseconds per image, and offered a mean Average Precision (mAP 50) of 94.3%, suggesting accuracy and efficiency in comparison to earlier models like YOLOv5 and Faster R-CNN. Performance recognition against the model was completed with several bird species including the Jawa Sparrow, Nicobar Pigeon, and Golden Eagle, which suggested reliability against with lower ambient light and cluttered backgrounds. The model can also be deployed on edge devices such as the NVIDIA Jetson Nano, which also suggested usability for large size ecological monitoring. Providing real-time knowledge of species distribution, population trends, and habitat change monitoring, makes this solution viable for conservationists and wildlife researchers, resulting in sound data driven environmental decision making. Keywords: Bird species identification; YOLOv8; Object detection; biodiversity.

Vamsi Krishna Karanam, Venkata Sai Abinay Kommuri, Harsha Vardhan Reddy Lekkala, Joshuva Arockia Dhanraj
Federated Deep Learning Framework for Efficient Detection of Diabetic Retinopathy

One of the major factors in causing vision loss and blindness in people who have been diagonalized with Diabetes is Diabetic Retinopathy (DR). An early diagnosis is always helpful in effective treatment. This research proposes a framework based on federated learning that involves a combination of MobileNetV2 and ResNet50 architectures to detect and classify DR. By using this method, the privacy of data is also increased. To overcome class imbalance, the proposed method applies a weighted loss function, PCA allows dimensionality reduction and data is augmented to develop a stronger model. Using federated learning, the framework enables distributed training of models on separate data, while still maintaining patient confidentiality and good accuracy. Research suggests that bringing federated deep learning into hospitals can create practical, secure and scalable ways to address problems in medical imaging. This work lightens the obstacles that exist in spotting diabetic retinopathy, including gathering lots of annotated data, using complex deep learning methods and embedding these methods in day-to-day medical practices. A ResNet50 deep learning model achieves 4.35% higher precision, 12.9% better recall, 15.94% greater F1-score, 20.97% better accuracy and 2.67% greater AUC-ROC when running federated learning than mobile deep learning models like MobileNetV2. Existing patterns and possible future directions are then discussed, with an emphasis on finding methods to improve the diagnosis of Diabetic Retinopathy.

K. Senthur Kumaran, Keerthika Periasamy, V. Swathi Reddy, Thanu Athitya Mohankumar
Machine Learning-Driven Strategies for Laboratory Diagnostic Pathway Optimization

The Hepatitis C Virus (HCV) is a blood-borne infection that mostly affects the liver. If untreated, it can cause cirrhosis, chronic liver disease, or liver cancer. The risk of serious liver damage rises when early signs are absent, making prompt detection difficult. For treatment and patient management to be successful, an early and precise diagnosis is necessary. Based on the outcomes of laboratory tests, different machine-learning approaches were used in this work to categorize blood donors and Hepatitis C patients. The dataset includes 615 instances and 12 features. Mean/mode imputation was used for data preprocessing to handle missing values. The most important characteristics for categorization were identified using feature selection approaches. Hepatitis C infection was predicted using a variety of machine learning models, such as ensemble learning, random forest, decision trees, and support vector machines. To guarantee a thorough examination of model efficacy, the models’ performance was assessed using accuracy, precision, recall, F1-score, ROC curve, and confusion matrix. According to the results, Ensemble Learning had the lowest accuracy of 87.8%, while the Decision Tree classifier outperformed the other models with the best accuracy of 99.5%. The work illustrates the potential of ML-based predictive models in medical applications by highlighting the effects of several machine-learning techniques on HCV detection. By lowering the chance of complications and assisting medical professionals in making well-informed judgments, the findings help to improve early detection techniques.

Rishithaa Maligireddy, Jayaprakash Vemuri
Hybrid Deep Learning for Meme Sentiment and Emotion Analysis Using LLMs

Memes have become a dominant form of digital communication, blending images and text to convey emotions, humor, sarcasm, or even controversial viewpoints. While memes enhance online engagement, they also pose risks by spreading hate speech and inciting violence. To address this, we utilize Artificial Intelligence (AI) to analyze and classify meme emotions. Our approach employs GPT-4 for deep contextual understanding, enabling precise sentiment categorization. The implemented model identifies and quantifies six emotional categories—happy (0), sad (0), angry (0), surprised (2), neutral (2), fearful (2), and disgusted (1)—based on meme content analysis. The results demonstrate the effectiveness of AI in detecting nuanced emotional cues, aiding in the development of responsible content moderation systems. This research enhances digital safety while preserving the expressive nature of memes through data-driven sentiment classification.

D. Swapna, M. Shanmuga Sundari, T. Nandini, S. K. Nyasa, M. Bhavya Bhavika
Advancements, Challenges, and Recent Trends in Facial Expression Recognition Systems: A Comprehensive Review

Facial Expression Recognition (FER), is necessary for the connection between humans and computers and affective computing, it has evolved significantly in recent years, with both Machine Learning-Based Face Expression Recognition Systems (ML-FERS) and Deep Learning-Based Face Expression Recognition Systems (DL-FERS) playing pivotal roles. This paper presents a comprehensive survey that explores the latest developments in Facial Expression Recognition Systems (FERS), encompassing both deep learning-based and traditional machine learning-based approaches. It provides an elaborate examination of the critical components of FERS, including data acquisition and preprocessing, feature extraction, emotion modeling, and evaluation metrics. By analyzing the strengths and limitations of these techniques, the survey offers insights into their applicability across different domains and practical scenarios and provides a valuable resource for researchers, practitioners, and decision-makers, facilitating informed decision-making and responsible development in this dynamic and influential area of technology.

C. Sheeba Joice, S. Hitha Shanthini, C. Jenisha
Hyperspectral Image Feature Extraction Using A Light Bidirectional Encoder Representations from Transformers

The rich spectral information stored in every pixel of hyperspectral imaging (HSI) has made it popular in various real-world applications. A nonlinear connection between the correlated HSI data item and the generated spectral data significantly produces complex classification results that are difficult to achieve using conventional methods. Researchers have recognized that only spectral information is insufficient to classify HSI data, so spatial information needs to be incorporated to improve the classification outcomes. This paper employs BERT and ALBERT models, recently proposed in natural language processing (NLP), as feature extraction models to acquire spectral information. These models use spectral signatures to learn the relationship between the pixels and their neighbors, and they seek to understand the context by learning the relationship between tokens. Despite this, BERT and ALBERT might not effectively use spatial information. This paper incorporates the spatial information with BERT and ALBERT and proposed Spatial-BERT and Spatial-ALBERT. Here, we efficiently integrate each target pixel’s spatial and spectral information in the HSI data for these models to improve the classification performance. The experimental results obtained on a publicly available dataset, the University of Pavia, demonstrate that the Spatial-ALBERT model achieves satisfactory performance with few parameters and relatively better efficiency than the Spatial-BERT model. The results show that Spatial-ALBERT outperformed the existing CNN and RNN-based methods and performed better in twelve out of twenty tests than Spatial-BERT.

Rajat Kumar Arya, Pratik Chattopadhyay, Rajeev Srivastava
An Empirical Analysis of Machine Learning and Deep Learning for Stock Market Forecasting

This paper compares various machine learning (ML), Prophet, and deep learning (DL) models’ performances in forecasting stock market trends using different datasets. The research uses the NVIDIA Corporation (NVDA) data from January 1, 2019, to January 1, 2024, showing forecasted trends and performance measures of each model. The Random Forest model performed best on important regression metrics: Mean Absolute Error (MAE = 0.31211), Mean Squared Error (MSE = 0.37742), Root Mean Squared Error (RMSE = 0.61434), R-squared (0.99986), and Mean Absolute Percentage Error (MAPE = 1.07827). At the same time, the LSTM model performed the best on risk-adjusted measures such as Sortino Ratio (6.26923), Jensen’s Alpha (1.51894), Maximum Drawdown (1.25614), and Mean Directional Accuracy (0.65088). The Decision Tree Regressor also exhibited high performance in Directional Accuracy (0.98857). The results emphasize the quality of ML and DL models for stock market forecasting to facilitate data-informed decisions by investors.

N. J. Jesan, R. Rahul Ganesh, T. Gireesh Kumar, I. Praveen
Robust Cockpit Panel Image Processing for Shape Analysis Using Deep Learning-Based Shape Classification and Transfer Learning

Reliable identification of shapes in aircraft cockpit panels is critical for ensuring operational safety and supporting automated fault detection systems. Though effective for basic shapes, traditional rule-based methods tend to underperform when faced with variable image quality or orientation. This paper presents a deep learning-based framework enhanced by transfer learning to address these limitations. A convolutional neural network (CNN) using a hybrid dataset comprising 40,000 generic geometric images and 8,000 cockpit-specific samples generated via augmentation. The model’s performance is significantly improved by fine-tuning it on domain-specific data, achieving 100% classification accuracy during validation. The proposed method enhances the accuracy, adaptability, and robustness in real-world cockpit scenarios. It outperforms traditional methods.

Joseph Chakravarthi Chavali, D. Abraham Chandy
SegCSA-X: A Robust Segmentation Model for Palmprint Images Under Illuminated Challenges

Image segmentation plays a critical role in computer vision applications such as biometric verification, medical imaging, and automated surveillance. However, existing segmentation methods struggle in challenging lighting conditions, where illumination variations introduce distortions. To address this issue, we propose Segmentation with Channel and Spatial Attention - Palmprints (SegCSA-X), a novel deep learning-based segmentation framework designed to enhance robustness against illumination-induced artifacts. The model integrates channel and spatial attention mechanisms to improve feature extraction and segmentation accuracy. SegCSA-X is evaluated against the Segment Anything Model (SAM) using three publicly available palmprint datasets: CASIA, REST, and IITD. While both models perform well on CASIA, SegCSA-X achieves superior results with an IoU of 98.88% and Dice Score of 99.44%. On the more challenging REST and IITD datasets, which contain significant illumination variations, SegCSA-X significantly outperforms SAM, achieving IoU scores of 96.47% (IITD) and 97.22% (REST), and Dice scores of 98.20% (IITD) and 98.58% (REST). These results demonstrate that SegCSA-X establishes a new benchmark for palmprint segmentation, particularly in adverse lighting conditions.

Rinkal Jain, Chintan Bhatt, Shakti Mishra
Evaluating the Impact of PCA-Based Feature Extraction on Predicting Customer Attrition in the Banking Sector

This study investigates customer attrition prediction in the credit card segment of the banking industry, with a focus on enhancing classification performance through advanced feature engineering techniques. The preprocessing pipeline includes encoding of categorical variables, normalization of numerical features, and dimensionality reduction via mixed-type Principal Component Analysis (PCA and K-means clustering to identify latent customer segments. Feature importance visualizations are leveraged to guide model refinement and interpretability. Among the evaluated models, Extreme Gradient Boosting (XGBoost) demonstrates superior predictive performance, achieving an overall accuracy of 95.82%, sensitivity of 84.17%, and a balanced accuracy of 91.10%. Notably, XGBoost exhibits robust specificity (98.03%) and recall (97.04%) in identifying attrited customers, outperforming other classifiers such as Random Forest and Support Vector Machines (SVM), which show marginal declines in performance post-feature selection. Overall, the findings highlight the critical role of feature selection and engineering in optimizing churn prediction models.

N. Siyad, Sunu Mary Abraham, Ann Baby, Jaya Vijayan
FocusNet: A Pathogenetically Oriented Deep Learning Framework for Enhanced Diagnostics and Treatment of Fundus Pathologies

One of the biggest health issues across the globe is eye diseases causing irreversible vision loss or blindness. Vision loss can be prevented by early and accurate diagnosis with proper treatment. This paper investigates three distinct models: MobileNet, ResNet, and DenseNet, along with their respective variants (MobileNet V1, V2, ResNet-50, ResNet-101, DenseNet-121, and DenseNet-169). Additionally, a pixel-wise attention mechanism is integrated with all the selected models. According to the experimental results, all the models incorporated with attention mechanism yielded good results. The findings of our study demonstrate that FocusNet based on DenseNet169 architecture with an attention mechanism, achieved the highest accuracy of 95%, followed by DenseNet 121 with 93%, MobileNet v2 with 91% ResNet 101 with 73%, and ResNet 50 lagged with accuracy of 51%. These findings highlight the effectiveness of attention mechanism with deep learning models for reliable eye disease classification. Also this study underscores the potential of attention driven deep learning framework in diagnosing ophthalmic diseases.

R. Bhuvanya, A. Saravanan, V. Vanitha, K. P. Koushik, S. Heblin Bersilla, R. Bharani Rajan
Optimizing Tomato Disease Classification Using Deep Learning Ensemble Approach with Color Opponency Space

One of the most significant crops grown in India is the tomato. Many deep learning models have found widespread application in the precise categorization of various tomato diseases. The deep learning plant pathology models are based on the popular convolutional neural network architectures such as Inception v3, DenseNet121, and ResNet50. This paper aims to improve the prediction accuracy of these three neural networks by using them with the global color constancy approach known as color opponency space (COS), employing hue, saturation, and value. Furthermore, these three models are applied in combination approaches using ensemble learning techniques such as soft voting and weighted voting to find the best-performing combination in terms of accuracy. Inception v3 with COS, DenseNet121 without COS, and ResNet50 with COS are the recommended configurations. This combination achieves 97.74% accuracy, which is higher than any other combination of these three models. This approach demonstrates the potential of hybrid ensemble-CNN frameworks in elevating plant disease classification accuracy for real-world agricultural applications.

Gurpreet Singh, Sandeep Sharma
CNN-Autoencoder with Linear Regression-Based Image Analyzer for Detection of Defects in SQ59 Armatures

During the production of armatures, many types of defects in its structure arise, making effective quality assurance techniques necessary. We have proposed a CNN-Autoencoder with linear regression-based image analyzer for detecting defective armatures using a dataset of over 500 images. CNN was used for feature extraction and reducing data dimensions. An autoencoder will detect anomalies using the perfect armature pieces segregating the defective ones. Lastly, a linear regression layer has been developed for classification based on intra-variations in the image distribution. The features for the regression model were generated using various statistical techniques in which the intra-image variation and distribution was considered. The F1-score achieved was 80.01%.

Suraj Sunil Joshi, Devarshi Anil Mahajan, Atharva Deshmukh, Mukta Dinesh Deore, Pooja Mishra, Piyush Jadhav
Project Management with Tamper-Proof Evaluation System Using Blockchain and Secured Storage

In the prevalent project management systems, the employee’s performance evaluation is a crucial component for preserving efficiency and growth. However, these systems often fail to provide precise, candid, and tamper-proof evaluations, which leads to wrongful analysis and irregularities. This paper proposes a blockchain-based tamper-proof evaluation system for project management, combined with protected storage using IPFS which will improve the credibility and safety of performance data. The system uses three key components through a private blockchain network for consistent data-recording, self-operating code evaluation tools including MOSS and Diffchecker for unbiased analysis, and IPFS-based decentralized storage for the data to be evaluated. This system focuses on improving data safety and is planned to reduce biased evaluation by 70–80% compared to traditional methods. It also includes a mechanism for managerial supervision which allows the managers to alter assessments in case the evaluation tool’s assessment has any error. Unlike the current systems, each and every modification is cryptographically signed, guaranteeing trackability while preventing unfairness. The blockchain layer is designed using a permissioned consensus mechanism, where only authenticated nodes participate in transaction validation and block propagation, ensuring robust access control and data integrity. Also, we use IPFS to access previous performance data while ensuring safety and independence from centered servers. To preserve authenticity, all the changes made by the manager are documented and forwarded to higher authorities for review. This regulated flexibility balances computerization with human judgement, ensuring fairness in performance assessment enhancing the impartiality of employee assessments. Moreover, the employees will get to be in a stress-free healthier environment and a more balanced workplace.

S. Sudharsana Saravanan, C. Swetha, S. Shravanthi, S. R. Sarvanthikha, U. Gayathri, N. Harini
Edge-Optimized Hybrid Framework for Image Super-Resolution Using Deep Learning and Fuzzy Logic

Image super-resolution (SR) addresses critical needs in medical diagnostics and geo-spatial analysis by enhancing low-resolution imaging data. While deep learning methods like Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) achieve high visual quality through adversarial training, they exhibit limitations in preserving anatomically critical edges in MRI scans and topographic features in satellite imagery. This paper presents a novel three-stage architecture that combines the generative capabilities of ESRGANof ESRGAN with a fuzzy inference system for edge optimization.The framework demonstrates a 7.8% improvement in the peak signal-to-noise ratio and 12% higher edge preservation scores compared to baseline ESRGAN on the DIV2K benchmark. The hybrid approach enables an interpretable edge enhancement through 27 fuzzy rules that govern gradient map optimization, addressing key limitations of purely data-driven methods.

Ananya Vemula, Amit Kumar Bairwa
Backmatter
Titel
Computer Vision and Robotics
Herausgegeben von
Harish Sharma
Abhishek Bhatt
Chirag Modi
Andries Engelbrecht
Copyright-Jahr
2026
Electronic ISBN
978-3-032-06253-6
Print ISBN
978-3-032-06252-9
DOI
https://doi.org/10.1007/978-3-032-06253-6

Die PDF-Dateien dieses Buches wurden gemäß dem PDF/UA-1-Standard erstellt, um die Barrierefreiheit zu verbessern. Dazu gehören Bildschirmlesegeräte, beschriebene nicht-textuelle Inhalte (Bilder, Grafiken), Lesezeichen für eine einfache Navigation, tastaturfreundliche Links und Formulare sowie durchsuchbarer und auswählbarer Text. Wir sind uns der Bedeutung von Barrierefreiheit bewusst und freuen uns über Anfragen zur Barrierefreiheit unserer Produkte. Bei Fragen oder Bedarf an Barrierefreiheit kontaktieren Sie uns bitte unter accessibilitysupport@springernature.com.

    Bildnachweise
    AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, NTT Data/© NTT Data, Wildix/© Wildix, arvato Systems GmbH/© arvato Systems GmbH, Ninox Software GmbH/© Ninox Software GmbH, Nagarro GmbH/© Nagarro GmbH, GWS mbH/© GWS mbH, CELONIS Labs GmbH, USU GmbH/© USU GmbH, G Data CyberDefense/© G Data CyberDefense, Vendosoft/© Vendosoft, Kumavision/© Kumavision, Noriis Network AG/© Noriis Network AG, WSW Software GmbH/© WSW Software GmbH, tts GmbH/© tts GmbH, Asseco Solutions AG/© Asseco Solutions AG, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, Ferrari electronic AG/© Ferrari electronic AG