Skip to main content
Top

Computer Vision and Robotics

Proceedings of CVR 2025, Volume 2

  • 2026
  • Book

About this book

This book consists of a collection of the high-quality research articles in the field of computer vision and robotics which are presented at the International Conference on Computer Vision and Robotics (CVR 2025), organized by National Institute of Technology, Goa, India, during 25–26 April 2025. The book discusses applications of computer vision and robotics in the fields like medical science, defence, and smart city planning. The book presents recent works from researchers, academicians, industry, and policy makers.

Table of Contents

Next
  • 1
  • current Page 2
  • 3
Previous
  1. MedSynGAN: A Federated GAN System for Generating Synthetic Medical Images

    Chinmay Inamdar, Arya Doshi, Swadha Joshi, Swati Shilaskar
    Abstract
    MedSynGAN addresses the challenges posed by data scarcity, diversity, and privacy in medical imaging, mainly when generating synthetic chest X-ray images. The proposed system, Federated Generative Adversarial Network (MedSynGAN), relies on a decentralized approach wherein a model can be trained at multiple healthcare settings with patient data kept strictly private to those sites. One of the significant advantages of this solution lies in its ability to integrate federated learning with the advanced architectures of GAN, such as DCGAN and ProGAN, for enhanced quality and stability of images. The MedSynGAN system offers improved model robustness and several other advantages such as performing diverse dataset learning without compromising sensitive information. Performance is measured using metrics such as Fréchet Inception Distance (FID), with some of the results reported reaching scores as low as 26.71 for high-quality images with structural similarity indices reaching 0.85 and peak signal-to-noise ratios above 51.29 to ensure that fidelity is high with applicability in a clinical setting. The synthetic images generated by this system show promise for augmenting training datasets for various diagnostic tasks such as pneumonia detection or nodule classification, potentially addressing data imbalance issues in medical image analysis. This research has tremendous applications in real-world healthcare because it caters to medical research and AI training needs while maintaining the critical privacy of patients’ data.
  2. Automated Detection of Defects in Solar Images Utilizing Integrated Deep Learning Frameworks

    Dhanashree Kulkarni, Preeti P. Kale, Hemant B. Mahajan, Priya Pise, Sulbha Yadav, Smita Desai
    Abstract
    Renewable energies must meet rising power demand while protecting the environment. Solar farms are a fast-growing, ecologically friendly power source. Multiple solar flaws from ordinary operations or environmental circumstances reduce solar energy generation efficiency. Electroencephalogram (EL) imaging shows defects. Manual defect identification is time-consuming, expensive, and inaccurate. An automated deep-learning method for solar failure identification and categorization is presented in this paper. Traditional semi-automated machine-learning methods require manual feature extraction. The suggested solar defect detection and classification approach uses several EL images. The system comprises three phases: pre-processing, Convolutional Neural Network (CNN)-based segmentation and feature extraction, and LSTM for solar abnormality classification. The proposed model pre-processes training and tests solar images before autonomous deep learning feature extraction and categorization. Gaussian filtering and contrast adjustment are main distortion correction approaches during pre-processing. The CNN estimates more strong and reliable features with distortion correction, improving detection accuracy. The CNN layers are further enhanced by applying Discrete Cosine Transform (DCT) and Independent Component Analysis (ICA) are used to extract and reduce robust features in the CNN model. Finally, the classification is performed using Long Short-Term Memory (LSTM) classifier. Performance of the proposed model is compared with the existing methods using two datasets. Proposed model has improved accuracy by 4.37% compared to existing techniques.
  3. Multiclass Botnet Detection in IoT Smart Home Using Deep and Ensemble Learning Techniques

    Haifa Ali Saeed Ali, J. Vakula Rani, Binay Budhathok
    Abstract
    In the rapidly expanding domain of the Internet of Things (IoT), smart homes have become increasingly prevalent, integrating various interconnected devices that enhance convenience, security, and energy efficiency. Nevertheless, this interconnectedness also brings up notable security complexities, including the threat of botnet assaults, which have the potential to jeopardize entire networks of devices. With the increasing quantity and variety of interconnected devices in smart homes, it is imperative to implement strong security measures to safeguard these systems. This work aims to improve the security of IoT smart homes by assessing the efficacy of several deep learning algorithms in detecting and classifying multiclass botnet attacks. We employed the Bot-IoT dataset to execute and evaluate the effectiveness of three advanced deep learning models: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Recurrent Neural Network (RNN), and two Ensemble Learning (EL) methods, namely Gradient Boost (GB) and AdaBoost (AB). The primary objective was to determine which models offer the most reliable protection against sophisticated cyber threats targeting smart home environments. The experimental findings demonstrate that DL models consistently achieved superior performance compared to the other EL models in many performance parameters, such as accuracy, sensitivity, false positive rate (FPR), false negative rate (FNR), Matthews correlation coefficient (MCC), and Area Under the Curve (AUC). The results emphasize the capability of DL models to greatly enhance the security of IoT smart homes, offering a robust defense mechanism against the changing nature of botnet attacks. This study emphasizes the crucial importance of utilizing modern DL methods to protect the growingly interconnected and susceptible IoT ecosystems.
  4. Evaluating the Performance of SVM and Random Forest in Air Quality Monitoring and Prediction

    G. Arthy, M. Malathi, P. Sinthia, P. Nagarajan, N. Ashokkumar, Kavitha Thandapani
    Abstract
    Air pollution is becoming a major threat in Indian cities and people are facing serious consequences due to air pollution. It affects the environment and the living standard of an individual. Major Pollutants are, particulate matter PM 10, PM 2.5, SO2, CO, NO2, O3, NH3, Pb, Ni, As, Benzo(a)pyrene and Benzene. Inhaling these hazardous substances leads to severe health issues. Predicting the air pollution of a particular area or city can help the government to take appropriate measures to reduce air pollution. Also, public can prevent themselves, if they receive alerts regarding pollution. Here, Random Forest Algorithm (RFA) and Support Vector Machine (SVM) algorithms are applied for predicting the pollutants such as PM 2.5, PM 10, CO, SO2. The performance is evaluated by measuring the Root Mean Square Error (RMSE). Prediction graphs show that the true value and the estimated value made by the models are close to each other which shows the accuracy of the algorithms. It was successfully demonstrated that error for SVM with non-linear kernal was 0.92 and for Random Forest algorithm the error was 0.91. The results prove that machine-learning algorithms can be utilized effectively to predict the AQI.
  5. Hybrid Dehazing Method for Low-Bandwidth Satellite Images Based on Generative Adversarial Network

    B. A. Sabarish, R. Aarthi, R. Dhamayandhi, Akshaya Sajith
    Abstract
    Remote sensing using satellite images plays a crucial role in gathering data about the Earth's surface, but haze degradation at low bandwidth can significantly compromise image clarity and utility. This paper introduces a hybrid dehazing method designed specifically for low bandwidth satellite images using Generative Adversarial Networks (GANs). The approach enhances the visual quality of hazy satellite imagery by integrating deep learning and advanced image processing techniques to counteract atmospheric effects while preserving essential image details necessary for remote sensing applications. Through experimental analysis, the proposed method exhibits its ability to produce clear, high-fidelity satellite images even under dense haze conditions. Compared with state-of-the-art methods such as DCP, Dehaze-Net, U-Net, and AIDED-Net, the hybrid approach shows a reduction in root mean square error of approximately 13% to 15% across different haze intensities and an average improvement of 34% in the structural similarity index measure. For dense hazy images, the method achieves a minimal AMSE value of 4.2, outperforming other techniques that fall in the range of 4.4 to 5.7. These results confirm the proposed method as a promising solution for enhancing satellite image quality across varying environmental conditions.
  6. Fibonacci Based Security Algorithm for ECG Signal

    A. Electa Alice Jayarani, Helen K. Joy, S. Thenmozhi, V. Sangeetha, Bhargavi Ananth
    Abstract
    The rising dependence on telemedicine and wearable well-being devices has made the safe transmission of sensitive biomedical information, like Electrocardiogram (ECG) signals, a critical concern. In this paper, we propose a Fibonacci-based security calculation to guarantee the secrecy and integrity of ECG signals during transmission. The proposed algorithm leverages the unique properties of the Fibonacci succession to make a lightweight and effective encryption instrument custom-made to the continuous prerequisites of ECG information. By coordinating Fibonacci changes with conventional cryptographic strategies, the proposed strategy accomplishes upgraded security with insignificant computational above, making it reasonable for asset-obliged devices. The algorithm is evaluated regarding encryption strength, computational proficiency, and capacity to endure cryptographic attacks. Trial results exhibit that the Fibonacci-based approach gets ECG signals and preserves the nature of the first clinical information, guaranteeing both security and exactness in remote health monitoring systems with the Signal to Noise greater than 50 dB.
  7. Hybrid CNN-LSTM with Attention Mechanism for Medical Visual Question Answering

    Vandana Ratwani, Jitendra Bhatia, Jitali Patel
    Abstract
    Medical Visual Question Answering (MedVQA) has become an important research field at the convergence of medical imaging and natural language processing for its potential to assist medical practitioners by providing precise answers to image-related questions. This paper introduces a hybrid framework combining CNN and LSTM with an attention mechanism for question answering over medical images. The framework processes medical images and interprets natural language questions to generate relevant answers. The multimodal architecture uses a Convolutional Neural Network (CNN) for extracting features from the image and a Long-Short-Term Memory (LSTM) network for handling the question and answer sequences. The Multimodal fusion, in the framework was performed using an attention mechanism. The model was trained and evaluated on two datasets: ImageCLEF VQA-MED 2019 and VQA-RAD 2019. On VQA-MED 2019 dataset, the model achieved training accuracy of 98%, validation accuracy of 61% and testing accuracy of 60% across 69 classes. For the VQA-RAD dataset, the model attained a training accuracy of 98%, a validation accuracy of 70%, and a testing accuracy of 74% across 8 classes.
  8. A Hybrid CNN-CapsNet Pipelined Approach For Disease Diagnostic With Severity Estimation

    Suhaib Aalam Bhat, Saliq Neyaz, Sidrat Shafiq Khan, Yash Paul, Rajesh Singh
    Abstract
    This research presents a novel and robust approach for the disease classification of leaves by integrating YOLOv9 with attention-enhanced CNNs such as VGG-19, EfficientNetV2B0, and ResNet101, along with a hybrid Capsule Network (CapsNet) built on a ResNet50 backbone with a lightweight attention mechanism. YOLOv9 is trained to detect and segment individual leaves from input images, each of which is then passed to the enhanced CNNs or the hybrid CapsNet for disease classification and severity estimation. The proposed hybrid approach emphasizes local hierarchical features while preserving base CNN-extracted features, representing a significant advancement over existing methods, as such a pipelined integration of YOLOv9 with CapsNets has not been considerably explored in prior literature. The enhanced CNN models achieved high validation accuracies, with VGG-19 reaching 98.69%, EfficientNetV2B0 achieving 99.68%, and ResNet101 scoring 99.55%, while the hybrid CapsNet outperformed all with an accuracy of 99.81%. The disease classification was conducted using the publicly available PlantVillage dataset consisting of 67,000 images, while a custom dataset for leaf detection was curated from Kaggle and Roboflow sources. The proposed method not only improves classification accuracy but also contributes to practical agricultural benefits by enabling early disease detection and severity analysis, potentially reducing pesticide usage and enhancing crop management.
  9. Bird Species Recognition Using YOLOv8: A Deep Learning Approach for Habitat Conservation and Preservation

    Vamsi Krishna Karanam, Venkata Sai Abinay Kommuri, Harsha Vardhan Reddy Lekkala, Joshuva Arockia Dhanraj
    Abstract
    Accurate identification of bird species is essential for conservation and ecological studies. This study proposed a real-time bird detection system using an advanced deep learning model, YOLOv8, developed to perform well against ecological obstacles. The model was trained using a custom dataset, where the data was made and labeled through Roboflow, at a resolution of 640 × 640 pixels over 50 epochs. The model exhibited an inference speed of 2.9 milliseconds per image, and offered a mean Average Precision (mAP 50) of 94.3%, suggesting accuracy and efficiency in comparison to earlier models like YOLOv5 and Faster R-CNN. Performance recognition against the model was completed with several bird species including the Jawa Sparrow, Nicobar Pigeon, and Golden Eagle, which suggested reliability against with lower ambient light and cluttered backgrounds. The model can also be deployed on edge devices such as the NVIDIA Jetson Nano, which also suggested usability for large size ecological monitoring. Providing real-time knowledge of species distribution, population trends, and habitat change monitoring, makes this solution viable for conservationists and wildlife researchers, resulting in sound data driven environmental decision making. Keywords: Bird species identification; YOLOv8; Object detection; biodiversity.
  10. Federated Deep Learning Framework for Efficient Detection of Diabetic Retinopathy

    K. Senthur Kumaran, Keerthika Periasamy, V. Swathi Reddy, Thanu Athitya Mohankumar
    Abstract
    One of the major factors in causing vision loss and blindness in people who have been diagonalized with Diabetes is Diabetic Retinopathy (DR). An early diagnosis is always helpful in effective treatment. This research proposes a framework based on federated learning that involves a combination of MobileNetV2 and ResNet50 architectures to detect and classify DR. By using this method, the privacy of data is also increased. To overcome class imbalance, the proposed method applies a weighted loss function, PCA allows dimensionality reduction and data is augmented to develop a stronger model. Using federated learning, the framework enables distributed training of models on separate data, while still maintaining patient confidentiality and good accuracy. Research suggests that bringing federated deep learning into hospitals can create practical, secure and scalable ways to address problems in medical imaging. This work lightens the obstacles that exist in spotting diabetic retinopathy, including gathering lots of annotated data, using complex deep learning methods and embedding these methods in day-to-day medical practices. A ResNet50 deep learning model achieves 4.35% higher precision, 12.9% better recall, 15.94% greater F1-score, 20.97% better accuracy and 2.67% greater AUC-ROC when running federated learning than mobile deep learning models like MobileNetV2. Existing patterns and possible future directions are then discussed, with an emphasis on finding methods to improve the diagnosis of Diabetic Retinopathy.
  11. Machine Learning-Driven Strategies for Laboratory Diagnostic Pathway Optimization

    Rishithaa Maligireddy, Jayaprakash Vemuri
    Abstract
    The Hepatitis C Virus (HCV) is a blood-borne infection that mostly affects the liver. If untreated, it can cause cirrhosis, chronic liver disease, or liver cancer. The risk of serious liver damage rises when early signs are absent, making prompt detection difficult. For treatment and patient management to be successful, an early and precise diagnosis is necessary. Based on the outcomes of laboratory tests, different machine-learning approaches were used in this work to categorize blood donors and Hepatitis C patients. The dataset includes 615 instances and 12 features. Mean/mode imputation was used for data preprocessing to handle missing values. The most important characteristics for categorization were identified using feature selection approaches. Hepatitis C infection was predicted using a variety of machine learning models, such as ensemble learning, random forest, decision trees, and support vector machines. To guarantee a thorough examination of model efficacy, the models’ performance was assessed using accuracy, precision, recall, F1-score, ROC curve, and confusion matrix. According to the results, Ensemble Learning had the lowest accuracy of 87.8%, while the Decision Tree classifier outperformed the other models with the best accuracy of 99.5%. The work illustrates the potential of ML-based predictive models in medical applications by highlighting the effects of several machine-learning techniques on HCV detection. By lowering the chance of complications and assisting medical professionals in making well-informed judgments, the findings help to improve early detection techniques.
  12. Hybrid Deep Learning for Meme Sentiment and Emotion Analysis Using LLMs

    D. Swapna, M. Shanmuga Sundari, T. Nandini, S. K. Nyasa, M. Bhavya Bhavika
    Abstract
    Memes have become a dominant form of digital communication, blending images and text to convey emotions, humor, sarcasm, or even controversial viewpoints. While memes enhance online engagement, they also pose risks by spreading hate speech and inciting violence. To address this, we utilize Artificial Intelligence (AI) to analyze and classify meme emotions. Our approach employs GPT-4 for deep contextual understanding, enabling precise sentiment categorization. The implemented model identifies and quantifies six emotional categories—happy (0), sad (0), angry (0), surprised (2), neutral (2), fearful (2), and disgusted (1)—based on meme content analysis. The results demonstrate the effectiveness of AI in detecting nuanced emotional cues, aiding in the development of responsible content moderation systems. This research enhances digital safety while preserving the expressive nature of memes through data-driven sentiment classification.
  13. Advancements, Challenges, and Recent Trends in Facial Expression Recognition Systems: A Comprehensive Review

    C. Sheeba Joice, S. Hitha Shanthini, C. Jenisha
    Abstract
    Facial Expression Recognition (FER), is necessary for the connection between humans and computers and affective computing, it has evolved significantly in recent years, with both Machine Learning-Based Face Expression Recognition Systems (ML-FERS) and Deep Learning-Based Face Expression Recognition Systems (DL-FERS) playing pivotal roles. This paper presents a comprehensive survey that explores the latest developments in Facial Expression Recognition Systems (FERS), encompassing both deep learning-based and traditional machine learning-based approaches. It provides an elaborate examination of the critical components of FERS, including data acquisition and preprocessing, feature extraction, emotion modeling, and evaluation metrics. By analyzing the strengths and limitations of these techniques, the survey offers insights into their applicability across different domains and practical scenarios and provides a valuable resource for researchers, practitioners, and decision-makers, facilitating informed decision-making and responsible development in this dynamic and influential area of technology.
  14. Hyperspectral Image Feature Extraction Using A Light Bidirectional Encoder Representations from Transformers

    Rajat Kumar Arya, Pratik Chattopadhyay, Rajeev Srivastava
    Abstract
    The rich spectral information stored in every pixel of hyperspectral imaging (HSI) has made it popular in various real-world applications. A nonlinear connection between the correlated HSI data item and the generated spectral data significantly produces complex classification results that are difficult to achieve using conventional methods. Researchers have recognized that only spectral information is insufficient to classify HSI data, so spatial information needs to be incorporated to improve the classification outcomes. This paper employs BERT and ALBERT models, recently proposed in natural language processing (NLP), as feature extraction models to acquire spectral information. These models use spectral signatures to learn the relationship between the pixels and their neighbors, and they seek to understand the context by learning the relationship between tokens. Despite this, BERT and ALBERT might not effectively use spatial information. This paper incorporates the spatial information with BERT and ALBERT and proposed Spatial-BERT and Spatial-ALBERT. Here, we efficiently integrate each target pixel’s spatial and spectral information in the HSI data for these models to improve the classification performance. The experimental results obtained on a publicly available dataset, the University of Pavia, demonstrate that the Spatial-ALBERT model achieves satisfactory performance with few parameters and relatively better efficiency than the Spatial-BERT model. The results show that Spatial-ALBERT outperformed the existing CNN and RNN-based methods and performed better in twelve out of twenty tests than Spatial-BERT.
  15. An Empirical Analysis of Machine Learning and Deep Learning for Stock Market Forecasting

    N. J. Jesan, R. Rahul Ganesh, T. Gireesh Kumar, I. Praveen
    Abstract
    This paper compares various machine learning (ML), Prophet, and deep learning (DL) models’ performances in forecasting stock market trends using different datasets. The research uses the NVIDIA Corporation (NVDA) data from January 1, 2019, to January 1, 2024, showing forecasted trends and performance measures of each model. The Random Forest model performed best on important regression metrics: Mean Absolute Error (MAE = 0.31211), Mean Squared Error (MSE = 0.37742), Root Mean Squared Error (RMSE = 0.61434), R-squared (0.99986), and Mean Absolute Percentage Error (MAPE = 1.07827). At the same time, the LSTM model performed the best on risk-adjusted measures such as Sortino Ratio (6.26923), Jensen’s Alpha (1.51894), Maximum Drawdown (1.25614), and Mean Directional Accuracy (0.65088). The Decision Tree Regressor also exhibited high performance in Directional Accuracy (0.98857). The results emphasize the quality of ML and DL models for stock market forecasting to facilitate data-informed decisions by investors.
  16. Robust Cockpit Panel Image Processing for Shape Analysis Using Deep Learning-Based Shape Classification and Transfer Learning

    Joseph Chakravarthi Chavali, D. Abraham Chandy
    Abstract
    Reliable identification of shapes in aircraft cockpit panels is critical for ensuring operational safety and supporting automated fault detection systems. Though effective for basic shapes, traditional rule-based methods tend to underperform when faced with variable image quality or orientation. This paper presents a deep learning-based framework enhanced by transfer learning to address these limitations. A convolutional neural network (CNN) using a hybrid dataset comprising 40,000 generic geometric images and 8,000 cockpit-specific samples generated via augmentation. The model’s performance is significantly improved by fine-tuning it on domain-specific data, achieving 100% classification accuracy during validation. The proposed method enhances the accuracy, adaptability, and robustness in real-world cockpit scenarios. It outperforms traditional methods.
  17. SegCSA-X: A Robust Segmentation Model for Palmprint Images Under Illuminated Challenges

    Rinkal Jain, Chintan Bhatt, Shakti Mishra
    Abstract
    Image segmentation plays a critical role in computer vision applications such as biometric verification, medical imaging, and automated surveillance. However, existing segmentation methods struggle in challenging lighting conditions, where illumination variations introduce distortions. To address this issue, we propose Segmentation with Channel and Spatial Attention - Palmprints (SegCSA-X), a novel deep learning-based segmentation framework designed to enhance robustness against illumination-induced artifacts. The model integrates channel and spatial attention mechanisms to improve feature extraction and segmentation accuracy. SegCSA-X is evaluated against the Segment Anything Model (SAM) using three publicly available palmprint datasets: CASIA, REST, and IITD. While both models perform well on CASIA, SegCSA-X achieves superior results with an IoU of 98.88% and Dice Score of 99.44%. On the more challenging REST and IITD datasets, which contain significant illumination variations, SegCSA-X significantly outperforms SAM, achieving IoU scores of 96.47% (IITD) and 97.22% (REST), and Dice scores of 98.20% (IITD) and 98.58% (REST). These results demonstrate that SegCSA-X establishes a new benchmark for palmprint segmentation, particularly in adverse lighting conditions.
  18. Evaluating the Impact of PCA-Based Feature Extraction on Predicting Customer Attrition in the Banking Sector

    N. Siyad, Sunu Mary Abraham, Ann Baby, Jaya Vijayan
    Abstract
    This study investigates customer attrition prediction in the credit card segment of the banking industry, with a focus on enhancing classification performance through advanced feature engineering techniques. The preprocessing pipeline includes encoding of categorical variables, normalization of numerical features, and dimensionality reduction via mixed-type Principal Component Analysis (PCA and K-means clustering to identify latent customer segments. Feature importance visualizations are leveraged to guide model refinement and interpretability. Among the evaluated models, Extreme Gradient Boosting (XGBoost) demonstrates superior predictive performance, achieving an overall accuracy of 95.82%, sensitivity of 84.17%, and a balanced accuracy of 91.10%. Notably, XGBoost exhibits robust specificity (98.03%) and recall (97.04%) in identifying attrited customers, outperforming other classifiers such as Random Forest and Support Vector Machines (SVM), which show marginal declines in performance post-feature selection. Overall, the findings highlight the critical role of feature selection and engineering in optimizing churn prediction models.
Next
  • 1
  • current Page 2
  • 3
Previous
Title
Computer Vision and Robotics
Editors
Harish Sharma
Abhishek Bhatt
Chirag Modi
Andries Engelbrecht
Copyright Year
2026
Electronic ISBN
978-3-032-06253-6
Print ISBN
978-3-032-06252-9
DOI
https://doi.org/10.1007/978-3-032-06253-6

PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.

Premium Partner

    Image Credits
    Neuer Inhalt/© ITandMEDIA, Nagarro GmbH/© Nagarro GmbH, AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, USU GmbH/© USU GmbH, Ferrari electronic AG/© Ferrari electronic AG