Feature Selection, Extraction, and Data Mining in Bioinformatics

Frontmatter

Agent Based Modeling of Fish Shoal Behavior

Fish require a sufficient amount of dissolved oxygen in the water to breathe and maintain their metabolic functions. Insufficient levels of dissolved oxygen can lead to stress, illness, and even death among the fish population. Therefore, it is crucial to model and simulate the relationship between dissolved oxygen levels and fish behavior in order to optimize aquarium design and management.One approach to studying this relationship is through multiagent based modeling. This method involves creating a virtual environment in which multiple agents, representing individual fish, interact with each other and the environment based on a set of predefined rules. In the context of aquarium simulation, the agents would represent individual fish, and the environment would represent the aquarium water and its parameters.

Pavla Urbanova, Ievgen Koliada, Petr Císař, Miloš Železný

Entropy Approach of Processing for Fish Acoustic Telemetry Data to Detect Atypical Behavior During Welfare Evaluation

Fish telemetry is an important tool for studying fish behavior, allowing to monitor fish movements in real-time. Analyzing telemetry data and translating it into meaningful indicators of fish welfare remains a challenge. This is where entropy approaches can provide valuable insights. Methods based on information theory can quantify the complexity and unpredictability of animal behavior distribution, providing a comprehensive understanding of the animal state. Entropy-based techniques can analyze telemetry data and detect changes in fish behavior, or irregularity. By analyzing the accelerometer data, using entropy approach, it is possible to identify atypical behavior that may be indicative of compromised welfare

Jan Urban

Determining HPV Status in Patients with Oropharyngeal Cancer from 3D CT Images Using Radiomics: Effect of Sampling Methods

Non-invasive detection of human papillomavirus (HPV) status is important for the treatment planning of patients with oropharyngeal cancer (OPC). In this work, three-dimensional (3D) head and neck computed tomography (CT) scans are utilized to identify HPV infection status in patients with OPC by applying radiomics and several resampling methods to handle highly imbalanced data. 1142 radiomic features were obtained from the segmented CT images of 238 patients. The features used were selected through correlation coefficient analysis, feature importance analysis, and backward elimination. The fifty most important features were chosen. Six different sampling methods, which are Synthetic Minority Oversampling Technique (SMOTE), Support Vector Machine Synthetic Minority Oversampling Technique (SVMSMOTE), Adaptive Synthetic Sampling Method (ADASYN), NearMiss, Condensed Nearest Neighbors (CNN), and Tomek’s Link, were performed on the training set for each of the positive and negative HPV classes. Two different machine learning (ML) algorithms, a Light Gradient Boosting Machine (LightGBM) and Extreme Gradient Boosting (XGBoost), were applied as predictive classification models. Model performances were assessed separately on 20% of the data. Oversampling methods displayed better performance than undersampling methods. The best performance was seen in the combination of SMOTE and XGBoost algorithms, which had an area under the curve (AUC) of 0.93 (95% CI: 82–99) and an accuracy of 90% (95% CI: 78–96). Our work demonstrated a reasonable accuracy in the forecast of HPV status using 3D imbalanced and small datasets. Further work is needed to test the algorithms on larger, balanced, and multi-institutional data.

Kubra Sarac, Albert Guvenis

MetaLLM: Residue-Wise Metal Ion Prediction Using Deep Transformer Model

Proteins bind to metals such as copper, zinc, magnesium, etc., serving various purposes such as importing, exporting, or transporting metal in other parts of the cell as ligands and maintaining stable protein structure to function properly. A metal binding site indicates the single amino acid position where a protein binds a metal ion. Manually identifying metal binding sites is expensive, laborious, and time-consuming. A tiny fraction of the millions of proteins in UniProtKB – the most comprehensive protein database – are annotated with metal binding sites, leaving many millions of proteins waiting for metal binding site annotation. Developing a computational pipeline is thus essential to keep pace with the growing number of proteins. A significant shortcoming of the existing computational methods is the consideration of the long-term dependency of the residues. Other weaknesses include low accuracy, absence of positional information, hand-engineered features, and a pre-determined set of residues and metal ions. In this paper, we propose MetaLLM, a metal binding site prediction technique, by leveraging the recent progress in self-supervised attention-based (e.g. Transformer) large language models (LLMs) and a considerable amount of protein sequences publicly available. LLMs are capable of modelling long residual dependency in a sequence. The proposed MetaLLM uses a transformer pre-trained on an extensive database of protein sequences and later fine-tuned on metal-binding proteins for multi-label metal ions prediction. A stratified 10-fold cross-validation shows more than 90% precision for the most prevalent metal ions. Moreover, the comparative performance analysis confirms the superiority of the proposed MetaLLM over classical machine-learning techniques.

Fairuz Shadmani Shishir, Bishnu Sarker, Farzana Rahman, Sumaiya Shomaji

Genome-Phenome Analysis

Frontmatter

Prediction of Functional Effects of Protein Amino Acid Mutations

Human Single Amino Acid Polymorphisms (SAPs) or Single Amino Acid Variants (SAVs) usually named as nonsynonymous Single Nucleotide Variants nsSNVs) represent the most frequent type of genetic variation among the population. They originate from non-synonymous single nucleotide variations (missense variants) where a single base pair substitution alters the genetic code in such a way that it produces a different amino acid at a given position. Since mutations are commonly associated with the development of various genetic diseases, it is of utmost importance to understand and predict which variations are deleterious and which are neutral. Computational tools based on machine learning are becoming promising alternatives to tedious and highly costly mutagenic experiments. Generally, varying quality, incompleteness and inconsistencies of nsSNVs datasets degrade the usefulness of machine learning approaches. Consequently, robust and more accurate approaches are essential to address these issues. In this paper, we present the application of a consensus classifier based on the holdout sampling, which shows robust and accurate results, outperforming currently available tools. We generated 100 holdouts to sample different classifiers’ architectures and different classification variables during the training stage. The best performing holdouts were selected to construct a consensus classifier and tested by blindly utilizing a k-fold (1 ≤ k ≤ 5) cross-validation approach. We also performed an analysis of the best protein attributes for predicting the effects of nsSNVs by calculating their discriminatory power. Our results show that our method outperforms other currently available tools, and provides robust results, with small standard deviations among folds and high accuracy. The superiority of our algorithm is based on the utilization of a tree of holdouts, where different machine learning algorithms are sampled with different boundary conditions or different predictive attributes.

Óscar Álvarez-Machancoses, Eshel Faraggi, Enrique J. de Andrés-Galiana, Juan Luis Fernández-Martínez, Andrzej Kloczkowski

Optimizing Variant Calling for Human Genome Analysis: A Comprehensive Pipeline Approach

The identification of genetic variations in large cohorts is a critical issue to identify patient cohorts, disease risks, and to develop more effective treatments. To help this analysis, we improved a variant calling pipeline for the human genome using state-of-the-art tools, including GATK (Hard Filter/VQSR) and DeepVariant. The pipeline was tested in a computing cluster where it was possible to compare Illumina Platinum genomes using different approaches. Moreover, by using a secure data space we provide a solution to privacy and security concerns in genomics research. Overall, this variant calling pipeline has the potential to advance the field of genomics research significantly, improve healthcare outcomes, and simplify the analysis process. Therefore, it is critical to rigorously evaluate these pipelines’ performance before implementing them in clinical settings.

Miguel Pinheiro, Jorge Miguel Silva, José Luis Oliveira

Healthcare and Diseases

Frontmatter

Improving Fetal Health Monitoring: A Review of the Latest Developments and Future Directions

Devices for monitoring heart rate and fetal movement are becoming increasingly sophisticated with the latest technological advancements. However, there is a pressing need for a more comprehensive review and analysis of these tools. The objective of this literature review is to identify fetal health monitoring devices, evaluate the sensitivity of monitoring fetal heart rate and growth/movement, and determine the target users for these devices. Method, the search was conducted using PubMed and Scopus databases, with PICO-based keywords that included pregnant women or pregnancy as the population, fetal or heart rate monitoring tool or fetal movement tool with sensitivity or reactivity as the research interest. A total of 2077 papers were initially identified, with 25 selected after article screening using the PRISMA approach and critical appraisal with JBI. Results, the analysis classified into four categories: the development of monitoring devices for fetal well-being, algorithm for more accurate maternal-fetal FHR filtering, fetal well-being indicators, and target users. The monitoring technology applied wave detection through cardiography and myography. The devices demonstrated a signal sensitivity of more than 75% for both the mother and the fetus. Conclusions, the analysis of the 25 articles revealed that monitoring technology is rapidly evolving, but almost all devices are designed for use by health workers. Only the Piezo Polymer Pressure Sensor is intended for independent monitoring by mothers and families. The development and research of independent fetal monitoring are necessary to improve monitoring during the new adaptation period after the Covid-19 pandemic.

Restuning Widiasih, Hasballah Zakaria, Siti Saidah Nasution, Saffan Firdaus, Risma Dwi Nur Pratiwi

Deep Learning for Parkinson’s Disease Severity Stage Prediction Using a New Dataset

Parkinson’s Disease (PD) is a progressive neurological disorder affecting the Basal Ganglia (BG) region in the mid-brain producing degeneration of motor abilities. The severity assessment is generally analyzed through Unified Parkinson’s Disease Rating Scale (UPDRS) as well as the amount changes noticed in the BG size in Positron Emission Tomography (PET) images. Predicting patients’ severity state through the analysis of these symptoms over time remains a challenging task. This paper proposes a Long Short Term Memory (LSTM) model using a newly created dataset in order to predict the next severity stage. The dataset includes the UPDRS scores and the BG size for each patient. This is performed by implementing a new algorithm that focuses on PET images and computes BG size. These computed values were then merged with UPDRS scores in a CSV file. The dataset created is fed into the proposed LSTM model for predicting the next severity stage by analyzing the severity scores over time. The model’s accuracy is assessed through several experiments and reached an accuracy of 84% which outperforms the other state-of-the-art method. These results confirm that our proposal holds great promise in providing a visualization of the next severity stage for all patients which aids physicians in monitoring disease progression and planning efficient treatment.

Zainab Maalej, Fahmi Ben Rejab, Kaouther Nouira

Improved Long-Term Forecasting of Emergency Department Arrivals with LSTM-Based Networks

Patient admission to Emergency Departments suffers from a great variability. This makes the resource allocation difficult to adjust, resulting in an inefficient service. Several studies have addressed this issue with machine learning’s regressors, time series analysis. This research proposes the use of improved recurrent neural networks that consider the dynamic nature of the data, introducing contextual variables that allow improving the predictability. Another important requirement from ED’s administration is to have a wider predicting horizon for short- and long-term resource allocations. The results obtained using the data from one single Hospital in Madrid confirm that the use of deep learning with contextual variables improve the predictability to 6% MAPE for seven days and four months forecasts. As future research lines, the influence of special events, such as seasonal epidemics, pollution episodes, sports or leisure events, as well as the extension of this study to different types of hospitals’ emergency departments.

Carolina Miranda-Garcia, Alberto Garces-Jimenez, Jose Manuel Gomez-Pulido, Helena Hernández-Martínez

High-Throughput Genomics: Bioinformatic Tools and Medical Applications

Frontmatter

Targeted Next Generation Sequencing of a Custom Capture Panel to Target Sequence 112 Cancer Related Genes in Breast Cancer Tumors ERBB2 Positive from Lleida (Spain)

Between 15–30% of invasive breast cancers have ERBB2 gene amplifications. Even though such homogeneous group, every patient has their own prognosis based on different features, some of which genetic involved. With that aim, we implemented a custom NGS panel comprising three probe subgroups for testing targeted mutations, copy number alteration and translocation in tumors with known HR and ERBB2 status previously assessed via immunohistochemistry and fluorescence in situ hybridization. DNA extracted from 47 primary breast cancers previously classified as ERBB2 positive were analyzed with a customized panel of 112 cancer related genes by targeted sequencing. Output data on fastq format was qualified, aligned and variant called trough different algorithms to find gene variations. A total of 20 different pathogenic mutations were found in 44% of tumors. Copy number analysis showed different levels of ERBB2 gene amplifications between tumors as so as different ERBB2 amplicon lengths. Additionally, the analysis of the raw data revealed the existence of two distinct mutation signatures. The identification of gene variations schemes that can yield distinct signatures holds the potential to accurately predict the subset of ERBB2-positive breast cancer patients who would respond best to treatment, specifically based on their pathological complete response (pCR).

Iván Benítez, Izaskun Urdanibia, Xavier Matias-Guiu, Ariadna Gasol, Ana Serrate, Serafín Morales, Ana Velasco

An Accurate Algorithm for Identifying Mutually Exclusive Patterns on Multiple Sets of Genomic Mutations

In cancer genomics, the mutually exclusive patterns of somatic mutations are important biomarkers that are suggested to be valuable in cancer diagnosis and treatment. However, detecting these patterns of mutation data is an NP-hard problem, which pose a great challenge for computational approaches. Existing approaches either limit themselves to pair-wise mutually exclusive patterns or largely rely on prior knowledge and complicated computational processes. Furthermore, the existing algorithms are often designed for genotype datasets, which may lose the information about tumor clonality, which is emphasized in tumor progression. In this paper, an algorithm for multiple sets with mutually exclusive patterns based on a fuzzy strategy to deal with real-type datasets is proposed. Different from the existing approaches, the algorithm focuses on both similarity within subsets and mutual exclusion among subsets, taking the mutual exclusion degree as the optimization objective rather than a constraint condition. Fuzzy clustering of the is done mutations by method of membership degree, and a fuzzy strategy is used to iterate the clustering centers and membership degrees. Finally, the target subsets are obtained, which have the characteristics of high similarity within subsets and the largest number of mutations, and high mutual exclusion among subsets and the largest number of subsets. This paper conducted a series of experiments to verify the performance of the algorithm, including simulation datasets and truthful datasets from TCGA. According to the results, the algorithm shows good performance under different simulation configurations, and some of the mutually exclusive patterns detected from TCGA datasets were supported by published literatures. This paper compared the performance to MEGSA, which is the best and most widely used method at present. The purities and computational efficiencies on simulation datasets outperformed MEGSA.

Siyu He, Jiayin Wang, Zhongmeng Zhao, Xuanping Zhang

A 20-Year Journey of Tracing the Development of Web Catalogues for Rare Diseases

Rare diseases are affecting over 350 million individuals on a worldwide scale. However, studying such diseases is challenging due to the lack of individuals compliant with the study protocols. This unavailability of information raises some challenges when defining the best treatments or diagnosing patients in the early stages. Multiple organizations invested in sharing data and resources without violating patient privacy, which resulted in several platforms focused on aggregating information. Despite the benefits of these solutions, the evolution of data regulations leads to new challenges that may not be fully addressed in such platforms. Therefore, in this paper, we proposed an enhanced version of one of the identified open-source platforms for this purpose. With this work, we were able to propose different strategies for aggregating and sharing information about rare diseases, as well as to analyse the technological evolution when producing tools for biomedical data sharing, namely by analysing the evolution of the selected tool over the last two decades.

João Rafael Almeida, José Luís Oliveira

Unsupervised Investigation of Information Captured in Pathway Activity Score in scRNA-Seq Analysis

With the introduction of single cell RNA sequencing, research on cell, tissue and disease heterogeneity has a new boost. Transforming gene levels to explainable pathways via single-sample enrichment algorithms is a leading analysis step in understanding cell heterogeneity. In this study, eight different single-sample methods were investigated and accompanied by gene level outcomes as reference. For all, their ability to cell separation and clustering accuracy was tested. For this purpose, six scRNA-Seq datasets with labelled cells and their various numbers were collected. PLAGE method shows the best cell separation with statistically significant differences to gene level and to six other tested methods. The clustering accuracy analysis also indicates that PLAGE is the leading technique in single-sample enrichment methods in scRNA-Seq. Here the worst performance was observed to JASMINE algorithm which, in contrary to PLAGE, was designed to analyse the scRNA-Seq data. Moreover, Louvain clustering shows the best results regarding cell division regardless of the tested single-sample method. Finally, the results of clustering given by PLAGE reveal T cell subtypes initially not labelled, showing the great potential of this algorithm in heterogeneity investigation.

Kamila Szumala, Joanna Polanska, Joanna Zyla

Meta-analysis of Gene Activity (MAGA) Contributions and Correlation with Gene Expression, Through GAGAM

It is well-known how sequencing technologies propelled cellular biology research in the latest years, giving an incredible insight into the basic mechanisms of cells. Single-cell RNA sequencing is at the front in this field, with Single-cell ATAC sequencing supporting it and becoming more popular. In this regard, multi-modal technologies play a crucial role, allowing the possibility to perform the mentioned sequencing modalities simultaneously on the same cells. Yet, there still needs to be a clear and dedicated way to analyze this multi-modal data. One of the current methods is to calculate the Gene Activity Matrix, which summarizes the accessibility of the genes at the genomic level, to have a more direct link with the transcriptomic data. However, this concept is not well-defined, and it is unclear how various accessible regions impact the expression of the genes. Therefore, this work presents a meta-analysis of the Gene Activity matrix based on the Genomic-Annotated Gene Ac- tivity Matrix model, aiming to investigate the different influences of its contributions on the activity and their correlation with the expression. This allows having a better grasp on how the different functional regions of the genome affect not only the activity but also the expression of the genes.

Lorenzo Martini, Roberta Bardini, Alessandro Savino, Stefano Di Carlo

Predicting Papillary Renal Cell Carcinoma Prognosis Using Integrative Analysis of Histopathological Images and Genomic Data

Renal cell carcinoma (RCC) is a common malignant tumor of the adult kidney, with the papillary subtype (pRCC) as the second most frequent. There is a need to improve evaluative criteria for pRCC due to overlapping diagnostic characteristics in RCC subtypes. To create a better prognostic model for pRCC, we proposed an integration of morphologic and genomic features. Matched images and genomic data from The Cancer Genome Atlas were used. Image features were extracted using CellProfiler, and prognostic image features were selected using least absolute shrinkage and selection operator and support vector machine algorithms. Eigengene modules were identified using weighted gene co-expression network analysis. Risk groups based on prognostic features were significantly distinct (p < 0.05) according to Kaplan-Meier analysis and log-rank test results. We used two image features and nine eigengene modules to construct a model with the Random Survival Forest method, measuring 11-, 16-, and 20-month areas under the curve (AUC) of a time-dependent receiver operating curve. The integrative model (AUCs: 0.877, 0.769, and 0.811) outperformed models trained with eigengenes alone (AUCs: 0.75, 0.733, and 0.785) and morphological features alone (AUCs: 0.593, 0.523, 0.603). This suggests that an integrative prognostic model based on histopathological images and genomic features could significantly improve survival prediction for pRCC patients and assist in clinical decision-making.

Shaira L. Kee, Michael Aaron G. Sy, Samuel P. Border, Nicholas J. Lucarelli, Akshita Gupta, Pinaki Sarder, Marvin C. Masalunga, Myles Joshua T. Tan

Image Visualization and Signal Analysis

Frontmatter

Medical X-ray Image Classification Method Based on Convolutional Neural Networks

Artificial intelligence and machine learning, including convolutional neural networks are increasingly entering the field of healthcare and medicine. The aim of the study is to optimize the learning process of convolutional neural networks through X-ray images pre-processing. A model for optimizing the overall architecture of a classifying convolutional neural network of chest X-rays by reducing the total number of convolutional operations is presented. The experimental results prove the successful application of the optimization process on the training of classification convolutional networks. There is a significant reduction in the training time of each epoch in the optimized convolutional networks. The optimization is of the order of 25% for the network with an input layer size of 124 × 124 and about 27% for the network with an input layer size of 122 × 122. The method can be applied in any field of image classification in which the informative image regions are grouped and subject to segmentation.

Veska Gancheva, Tsviatko Jongov, Ivaylo Georgiev

Digital Breast Tomosynthesis Reconstruction Techniques in Healthcare Systems: A Review

Digital Breast Tomosynthesis (DBT) images are widely used to increase breast cancer detection and reduce recall rates in healthcare systems for breast cancer detection. In the field of medical imaging, computer-aided diagnosis (CAD) systems are used to analyze this type of images. Generally, in order to achieve an early detection of breast cancer, these CAD systems start with the reconstruction part of the image, the pre-processing step and then the segmentation and classification. However, the post-acquisition techniques of DBT can impact the detection and diagnosis of breast cancer, and bias the final decision in computer-aided detection and diagnosis systems. Mainly, the reconstruction phase in computer aided detection systems, that helps prepare the DBT for further analysis, such as segmentation and classification of abnormalities. In this paper, we present a survey of different techniques for DBT reconstruction, that we compared theoretically in terms of advantages and drawbacks, particularly for healthcare systems dedicated to breast cancer detection.

Imane Samiry, Ilhame Ait Lbachir, Imane Daoudi, Saida Tallal, Sayouti Adil

BCAnalyzer: A Semi-automated Tool for the Rapid Quantification of Cell Monolayer from Microscopic Images in Scratch Assay

The scratch assay is a simple and low-cost approach to evaluate the speed and character of cell migration in vitro. The principle is based on the online imaging of the “scratch” in the cells monolayer being filled with new cells from both edges in real time. Thus, the scratch assay represents a model of cell migration during wound healing and is compatible with imaging of live cells during migration under various conditions. For the quantitative assessment of the scratch area in microscopic images, we suggest a simple semi-automated two-step algorithm based on the local edge density estimation, which does not require any preliminary learning or tuning, although with a couple of parameters directly controllable by the end user to adjust the analysis resolution and sensitivity, respectively. Using several representative examples of cell lines, we show explicitly the effectiveness of the image segmentation and quantification of the cells monolayer and discuss benefits and limitations of the proposed approach. A simple open-source software tool based on the proposed algorithm with on-the-fly visualization allowing for a straightforward feedback by an investigator without any specific expertise in image analysis techniques is freely available online at https://gitlab.com/digiratory/biomedimaging/bcanalyzer

Aleksandr Sinitca, Airat Kayumov, Pavel Zelenikhin, Andrey Porfiriev, Dmitrii Kaplun, Mikhail Bogachev

Color Hippocampus Image Segmentation Using Quantum Inspired Firefly Algorithm and Merging of Channel-Wise Optimums

Color image segmentation is essential for medical image processing to figure out the cells, tissues, lesion areas, etc. The hippocampus is an extension of the temporal lobe of the brain. This area of the brain has been intensively studied for its clinical significance. It is the first and most severely affected structure in neuropsychiatric conditions. Meta-heuristic algorithm-based optimal segmentation is a widely accepted method in the medical domain. In this work, a hybrid method called the quantum-inspired firefly algorithm (QIFA) has been implemented in a multi-core environment to perform color segmentation of the hippocampus images in a parallel manner. The parallel QIFA runs on three different channels, Red, Green, and Blue of the input color image, and a subsequent merging is applied. The correlation has been considered as the objective function. Finally, a study has been carried out concerning various image segmentation evaluation parameters, and the proposed method has been compared to other metaheuristic algorithms. The analysis of the results shows that the method is effective for medical image segmentation. The speed-up of the technique has also been examined in detail for various image sizes and color levels.

Alokeparna Choudhury, Sourav Samanta, Sanjoy Pratihar, Oishila Bandyopadhyay

Breast Cancer Histologic Grade Identification by Graph Neural Network Embeddings

Deep neural networks are nowadays state-of-the-art methodologies for general-purpose image classification. As a consequence, such approaches are also employed in the context of histopathology biopsy image classification. This specific task is usually performed by separating the image into patches, giving them as input to the Deep Model and evaluating the single sub-part outputs. This approach has the main drawback of not considering the global structure of the input image and can lead to avoiding the discovery of relevant patterns among non-overlapping patches. Differently from this commonly adopted assumption, in this paper, we propose to face the problem by representing the input into a proper embedding resulting from a graph representation built from the tissue regions of the image. This graph representation is capable of maintaining the image structure and considering the relations among its relevant parts. The effectiveness of this representation is shown in the case of automatic tumor grading identification of breast cancer, using public available datasets.

Salvatore Calderaro, Giosué Lo Bosco, Filippo Vella, Riccardo Rizzo

A Pilot Study of Neuroaesthetics Based on the Analysis of Electroencephalographic Connectivity Networks in the Visualization of Different Dance Choreography Styles

Neuroaesthetics allows us to understand how the brain works in different artistic languages and, therefore, to broaden the knowledge of our aesthetic judgments. The present pilot study is an interdisciplinary work that aims to differentiate different aesthetic dance choreography styles and to demonstrate the influence of training, learning, enculturation and familiarization of these styles on their brain perception by means of neurophysiological measurements of EEG signals and neural connectivity network analysis techniques. To this end, EEGs of non-expert dancers are recorded while viewing two fragments (film clips) of classical and modern dance and during other control conditions. Measures of functional connectivity between recorded regions are obtained from phase synchronization measurements between pairs of EEG signals in each EEG frequency band (FB). The responses of each FB are evaluated from indices obtained from models of EEG connectivity networks -graphs and connectomes- constructed from graph theory and network-based statistics (NBS) in a global and local context. Thus, significant alterations -in some of the indices- are observed between different contrasts and conditions in certain areas and specific EEG connections that depend on the EEG frequency band under consideration. These first results, therefore, suggest the usefulness of this neuroaesthetic experimental paradigm. On the other hand, these neuroaesthetic procedures may be of special interest in biomedicine because they provide knowledge about different languages that can be applied in therapies and treatments.

Almudena González, José Meléndez-Gallardo, Julian J. Gonzalez

Machine Learning in Bioinformatics and Biomedicine

Frontmatter

Ethical Dilemmas, Mental Health, Artificial Intelligence, and LLM-Based Chatbots

The present study analyzes the bioethical dilemmas related to the use of chatbots in the field of mental health. A rapid review of scientific literature and media news was conducted, followed by systematization and analysis of the collected information. A total of 24 moral dilemmas were identified, cutting across the four bioethical principles and responding to the context and populations that create, use, and regulate them. Dilemmas were classified according to specific populations and their functions in mental health. In conclusion, bioethical dilemmas in mental health can be categorized into four areas: quality of care, access and exclusion, responsibility and human supervision, and regulations and policies for LLM-based chatbot use. It is recommended that chatbots be developed specifically for mental health purposes, with tasks complementary to the therapeutic care provided by human professionals, and that their implementation be properly regulated and has a strong ethical framework in the field at a national and international level.

Johana Cabrera, M. Soledad Loyola, Irene Magaña, Rodrigo Rojas

Cyclical Learning Rates (CLR’S) for Improving Training Accuracies and Lowering Computational Cost

Prediction of different lung pathologies using chest X-ray images is a challenging task requiring robust training and testing accuracies. In this article, one-class classifier (OCC) and binary classification algorithms have been tested to classify 14 different diseases (atelectasis, cardiomegaly, consolidation, effusion, edema, emphysema, fibrosis, hernia, infiltration, mass, nodule, pneumonia, pneumothorax and pleural-thickening). We have utilized 3 different neural network architectures (MobileNetV1, Alexnet, and DenseNet-121) with four different optimizers (SGD, Adam, and RMSProp) for comparing best possible accuracies. Cyclical learning rate (CLR), a tuning hyperparameters technique was found to have a faster convergence of the cost towards the minima of cost function. Here, we present a unique approach of utilizing previously trained binary classification models with a learning rate decay technique for re-training models using CLR’s. Doing so, we found significant improvement in training accuracies for each of the selected conditions. Thus, utilizing CLR’s in callback functions seems a promising strategy for image classification problems.

Rushikesh Chopade, Aditya Stanam, Anand Narayanan, Shrikant Pawar

Relation Predictions in Comorbid Disease Centric Knowledge Graph Using Heterogeneous GNN Models

Disease comorbidity has been an important topic of research for the last decade. This topic has become more popular due to the recent outbreak of COVID-19 disease. A comorbid condition due to multiple concurrent diseases is more fatal than a single disease. These comorbid conditions can be caused due to different genetic as well as drug-related side effects on an individual. There are already successful methods for predicting comorbid disease associations. This disease-associated genetic or drug-invasive information can help infer more target factors that cause common diseases. This may further help find out effective drugs for treating a pair of concurrent diseases. In addition to that, the common drug side-effects causing a disease phenotype and the gene associated with that can be helpful in finding important biomarkers for further prognosis of the comorbid disease. In this paper, we use the knowledge graph (KG) from our previous study to find out target-specific relations apart from sole disease-disease associations. We use four different heterogeneous graph neural network models to perform link prediction among different entities in the knowledge graph and we perform a comparative analysis among them. It is found that our best heterogeneous GNN model outperforms existing state-of-the-art models on a few target-specific relationships. Further, we also predict a few novel drug-disease, drug-phenotype, disease-phenotype, and gene-phenotype associations. These interrelated associations are further used to find out the common phenotypes associated with a comorbid disease as well as caused by the direct side effects of a treating drug. In this regard, our methodology also predicts some novel biomarkers and therapeutics for different fatal prevalent diseases.

Saikat Biswas, Koushiki Dasgupta Chaudhuri, Pabitra Mitra, Krothapalli Sreenivasa Rao

Inter-helical Residue Contact Prediction in -Helical Transmembrane Proteins Using Structural Features

Residue contact maps offer a 2-d, reduced representation of 3-d protein structures and constitute a structural constraint and scaffold in structural modeling. Precise residue contact maps are not only helpful as an intermediate step towards generating effective 3-d protein models, but also useful in their own right in identifying binding sites and hence providing insights about a protein’s functions. Indeed, many computational methods have been developed to predict residue contacts using a variety of features based on sequence, physio-chemical properties, and co-evolutionary information. In this work, we set to explore the use of structural information for predicting inter-helical residue contact in transmembrane proteins. Specifically, we extract structural information from a neighborhood around a residue pair of interest and train a classifier to determine whether the residue pair is a contact point or not. To make the task practical, we avoid using the 3-d coordinates directly, instead we extract features such as relative distances and angles. Further, we exclude any structural information of the residue pair of interest from the input feature set in training and testing of the classifier. We compare our method to a state-of-the-art method that uses non-structural information on a benchmark data set. The results from experiments on held out datasets show that the our method achieves above 90% precision for top L/2 and L inter-helical contacts, significantly outperforming the state-of-the-art method and may serve as an upper bound on the performance when using non-structural information. Further, we evaluate the robustness of our method by injecting Gaussian normal noise into PDB coordinates and hence into our derived features. We find that our model’s performance is robust to high noise levels.

Aman Sawhney, Jiefu Li, Li Liao

Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs

Among the many proposed solutions in graph embedding, traditional random walk-based embedding methods have shown their promise in several fields. However, when the graph contains high-degree nodes, random walks often neglect low- or middle-degree nodes and tend to prefer stepping through high-degree ones instead. This results in random-walk samples providing a very accurate topological representation of neighbourhoods surrounding high-degree nodes, which contrasts with a coarse-grained representation of neighbourhoods surrounding middle and low-degree nodes. This in turn affects the performance of the subsequent predictive models, which tend to overfit high-degree nodes and/or edges having high-degree nodes as one of the vertices. We propose a solution to this problem, which relies on a degree normalization approach. Experiments with popular RW-based embedding methods applied to edge prediction problems involving eight protein-protein interaction (PPI) graphs from the STRING database show the effectiveness of the proposed approach: degree normalization not only improves predictions but also provides more stable results, suggesting that our proposal has a regularization effect leading to a more robust convergence.

Luca Cappelletti, Stefano Taverni, Tommaso Fontana, Marcin P. Joachimiak, Justin Reese, Peter Robinson, Elena Casiraghi, Giorgio Valentini

Medical Image Processing

Frontmatter

Role of Parallel Processing in Brain Magnetic Resonance Imaging

Parallel processing is a procedure for making computation of more than a processor to overcome the difficulty of separate parts of an overall task. It is really crucial for some medicine-related tasks since the method provide time-efficient computation by a program, thus several calculations could be made simultaneously. Whereas, magnetic resonance imaging (MRI) is one of the medical imaging methods to show form of an anatomy and biological progressions of a human body. Parallel processing methods could be useful for being implemented in MRI with the aim of getting real-time, interventional and time-efficient acquisition of images. Given the need of faster computation on brain MRI to get early and real-time feedbacks in medicine, this paper presents a systematic review of the literature related to brain MRIs focusing on the emerging applications of parallel processing methods for the analysis of brain MRIs. We investigate the articles consisting of these kernels with literature matrices including their, materials, methods, journal types between 2013 and 2023. We distill the most prominent key concepts of parallel processing methods.

Ayca Kirimtat, Ondrej Krejcar

Deep Learning Systems for the Classification of Cardiac Pathologies Using ECG Signals

In this paper, several deep learning models are analyzed for the construction of the automated helping system to ECG classification. The methodology presented in this article begins with a study of the different alternatives for performing the discrete wavelet transform-based scalogram for an ECG. Then, several Deep Learning architectures are analysed. Due to the large number of architectures in the literature, seven have been selected as they have a high degree of acceptance in the scientific community. The influence of the number of epochs used for training will also be analysed. In addition to the development of a classifier able to accurately solve the multi-class problem of, given an ECG signal, deciding which pathology the subject is suffering from (main interest for a medical expert), we also want to rigorously analyze, through the use of a statistical tool (ANOVA), the impact the main functional blocks of our system have on its behaviour. As a novel result of this article, different homogeneous groups of deep learning systems are analysed (from a statistical point of view, they have the same impact on the accuracy of the system). As can be seen in the results, there are four homogeneous groups, with the group with the lowest accuracy index obtaining an average value of 76,48% in the classification and the group with the best results, with an average accuracy of 83,83%.

Ignacio Rojas-Valenzuela, Fernando Rojas, Juan Carlos de la Cruz, Peter Gloesekoetter, Olga Valenzuela

Transparent Machine Learning Algorithms for Explainable AI on Motor fMRI Data

With the emergence of explainable artificial intelligence (xAI), two main approaches for tackling model explainability have been put forward. Firstly, the use of inherently simple and transparent models that with easily understandable inner-workings (interpretability) and can readily provide useful knowledge about the model’s decision making process (explainability). The second approach is the development of interpretation and explanation algorithms that may shed light upon black-box models. This is particularly interesting to apply on fMRI data as either approach can provide pertinent information about the brain’s underlying processes. This study aims to explore the capability of transparent machine learning algorithms to correctly classify motor fMRI data, if more complex models inherently lead to a better prediction of the motor stimulus, and the capability of the Integrated Gradients method to explain a fully connected artificial neural network (FCANN) used to model motor fMRI data. The transparent machine learning models tested are Linear Regression, Logistic Regression, Naive Bayes, K-Neighbors, Support Vector Machine, and Decision Tree, while the Integrated Gradients method is tested on a FCANN with 3 hidden layers. It is concluded that the transparent models may accurately classify the motor fMRI data, with accuracies ranging from 66.75% to 85.0%. The best transparent model, multinomial logistic regression, outperformed the most complex model, FCANN. Lastly, it is possible to extract pertinent information about the underlying brain processes via the Integrated Gradients method applied to the FCANN by analyzing the spatial expression of the most relevant Independent Components for the FCANN’s decisions.

José Diogo Marques dos Santos, David Machado, Manuel Fortunato

A Guide and Mini-Review on the Performance Evaluation Metrics in Binary Segmentation of Magnetic Resonance Images

Eight previously proposed segmentation evaluation metrics for brain magnetic resonance images (MRI), which are sensitivity (SE), specificity (SP), false-positive rate (FPR), false-negative rate (FNR), positive predicted value (PPV), accuracy (ACC), Jaccard index (JAC) and dice score (DSC) are presented and discussed in this paper. These evaluation metrics could be classified into two groups namely pixel-wise metrics and area-wise metrics. We, also, distill the most prominent previously published papers on brain MRI segmentation evaluation metrics between 2021 and 2023 in a detailed literature matrix. The identification of illness or tumor areas using brain MRI image segmentation is a large area of research. However, there is no single segmentation evaluation metric when evaluating the results of brain MRI segmentation in the current literature. Also, the pixel-wise metrics should be supported with the area-wise metrics such as DSC while evaluating the image segmentation results and each metric should be compared with other metrics for better evaluation.

Ayca Kirimtat, Ondrej Krejcar

Next Generation Sequencing and Sequence Analysis

Frontmatter

The Pathogenetic Significance of miR-143 in Atherosclerosis Development

Our pilot studies of blood plasma in patients with comorbidities allow miR-143 to be regarded as a potential biomarker of atherosclerosis expression, but the literature data on the dynamics of miR-143 expression are too controversial. The continuation of the study to verify the results by “wet lab” methods is costly and therefore it is advisable to first determine whether the putative biomarker has pathogenic relevance. The aim of the study was to establish and assess the role of miR-143 in atherosclerosis using a comprehensive bioinformatics analysis, to identify possible pathways of its inclusion in the pathology mechanism. Using open sources, two sets of gene expression data in atherosclerotic plaques were selected, then only differentially expressed genes (DEGs) shared by both datasets were identified, from which a protein-protein interaction (PPI) extended network was then constructed. The next step was network analysis (identification of clusters and hub genes within them) and construction of regulatory networks of miRNAs-hub genes. The analysis revealed that miR-143 is one of the central miRNAs whose action may be associated with suppression of atherosclerosis formation through its targeting of several hub genes: TLR2, TNF and LYN. However, another target is ITGB1, whose reduction increases autophagy and activates the inflammatory response. Based on established topological and functional characteristics, miR-143 is of interest for further verification as a biomarker and for possible therapeutic applications in atherosclerosis. We also cut through the perspective of the already known effects of miR-143 on the NK-kB pathway, but in the context of atherosclerosis (via TNF and TLR2).

Mikhail Lopatin, Maria Vulf, Maria Bograya, Anastasia Tynterova, Larisa Litvinova

Comparison of VCFs Generated from Different Software in the Evaluation of Variants in Genes Responsible for Rare Thrombophilic Conditions

As part of the implementation and validation of an optimal diagnostic approach based on high-throughput sequencing by Ion Torrent platform in the diagnosis of the rare thrombophilic conditions of protein S (PS), protein C (PC) and antithrombin (AT) deficiency, we compared data from three different software tools – Torrent Suite, Ion Reporter and NextGene – to compare their performance and accuracy in the analysis of each sequence variant detected. A cohort of 31 patients was selected for PS (7), PC (13) and AT (11) deficiency based on defined indication criteria. Within these patient groups, a mutation detection rate of 67.7% was observed. In a cohort of 10 patients who were sequenced in a single sequencing run the three evaluated software detected 16, 19, and 27 variants in the PROS1 gene; 17, 17, 19 variants in the PROC gene; and 15, 15, 16 variants for the SERPINC1 gene in their baseline settings. For data generated from the Ion Torrent platform, software from the same provider seems to be more suitable, mainly because of the quality of the false positive filtering.For further evaluation of the validity of the software used, it will be necessary to expand the cohort of patients examined.

R. Vrtel, P. Vrtel, R. Vodicka

Uterine Cervix and Corpus Cancers Characterization Through Gene Expression Analysis Using the KnowSeq Tool

The characterization of cancer through gene expression quantification data analysis is a powerful and widely used approach in cancer research. This paper describes two experiments that demonstrate its potential in identifying differentially expressed genes (DEGs) and accurately predicting cancer subtypes. To achieve this, RNA-seq data was obtained from TCGA database and subsequently preprocessed and analyzed using the KnowSeq package from Bioconductor. In the first experiment, the study focuses on identifying DEGs in healthy, cervical cancerous, and uterine corpus cancerous tissues. The kNN classifier was employed to evaluate the utility of these genes in predicting a sample belonging to one of these three classes. A gene signature consisting of only three genes produced remarkable results on a 5-fold cross-validation assessment process, with overall test accuracy and F1 values of 99.33% and 96.73%, respectively. The paper provides ontological enrichment, associated diseases, and pathways of the gene signature to shed light on the molecular mechanisms involved in both cancers. The second experiment extends the work by classifying cervical cancer samples into their two most common histological types: adenocarcinoma and squamous cell carcinoma. By using a single gene, the study was able to achieve 100% of test accuracy in a 5-fold cross-validation process. Additionally, the classification of an adenosquamous sample into one of these two categories based on the number of genes used was also examined. Overall, these experiments demonstrate the potential of these techniques to improve cancer diagnosis and treatment. Moreover, the study provides valuable insights into the underlying molecular mechanisms of cervix and uterine corpus cancers, laying the groundwork for further research in this field.

Lucía Almorox, Luis Javier Herrera, Francisco Ortuño, Ignacio Rojas

Sensor-Based Ambient Assisted Living Systems and Medical Applications

Frontmatter

Smart Wearables Data Collection and Analysis for Medical Applications: A Preliminary Approach for Functional Reach Test

The Functional Reach Test (FRT) is a commonly used clinical tool to evaluate the dynamic balance and fall risk in older adults and individuals with specific neurological conditions. Several studies have highlighted the importance of using FRT as a reliable and valid measure for assessing functional balance and fall risk in diverse populations. Additionally, FRT is sensitive to changes in balance function over time and can provide critical information for designing rehabilitation programs to improve balance and reduce the risk of falls. The FRT has also been used as a screening tool for identifying individuals who may benefit from further assessment or intervention. Thus, the FRT is a valuable clinical instrument for assessing functional balance and fall risk and should be incorporated into routine clinical practice. This paper intends to describe the preliminary results and future directions for implementing the FRT with various sensors gathered from smartphones or smart wearables to provide valuable indicators to aid professional healthcare practitioners in evaluating and following up on the elderly but possibly extending to other age groups.

João Duarte, Luís Francisco, Ivan Miguel Pires, Paulo Jorge Coelho

Radar Sensing in Healthcare: Challenges and Achievements in Human Activity Classification & Vital Signs Monitoring

Driven by its contactless sensing capabilities and the lack of optical images being recorded, radar technology has been recently investigated in the context of human healthcare. This includes a broad range of applications, such as human activity classification, fall detection, gait and mobility analysis, and monitoring of vital signs such as respiration and heartbeat. In this paper, a review of notable achievements in these areas and open research challenges is provided, showing the potential of radar sensing for human healthcare and assisted living.

Francesco Fioranelli, Ronny G. Guendel, Nicolas C. Kruse, Alexander Yarovoy

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Feature Selection, Extraction, and Data Mining in Bioinformatics

Frontmatter

Agent Based Modeling of Fish Shoal Behavior

Entropy Approach of Processing for Fish Acoustic Telemetry Data to Detect Atypical Behavior During Welfare Evaluation

Determining HPV Status in Patients with Oropharyngeal Cancer from 3D CT Images Using Radiomics: Effect of Sampling Methods

MetaLLM: Residue-Wise Metal Ion Prediction Using Deep Transformer Model

Genome-Phenome Analysis

Frontmatter

Prediction of Functional Effects of Protein Amino Acid Mutations

Optimizing Variant Calling for Human Genome Analysis: A Comprehensive Pipeline Approach

Healthcare and Diseases

Frontmatter

Improving Fetal Health Monitoring: A Review of the Latest Developments and Future Directions

Deep Learning for Parkinson’s Disease Severity Stage Prediction Using a New Dataset

Improved Long-Term Forecasting of Emergency Department Arrivals with LSTM-Based Networks

High-Throughput Genomics: Bioinformatic Tools and Medical Applications

Frontmatter

Targeted Next Generation Sequencing of a Custom Capture Panel to Target Sequence 112 Cancer Related Genes in Breast Cancer Tumors ERBB2 Positive from Lleida (Spain)

An Accurate Algorithm for Identifying Mutually Exclusive Patterns on Multiple Sets of Genomic Mutations

A 20-Year Journey of Tracing the Development of Web Catalogues for Rare Diseases

Unsupervised Investigation of Information Captured in Pathway Activity Score in scRNA-Seq Analysis

Meta-analysis of Gene Activity (MAGA) Contributions and Correlation with Gene Expression, Through GAGAM

Predicting Papillary Renal Cell Carcinoma Prognosis Using Integrative Analysis of Histopathological Images and Genomic Data

Image Visualization and Signal Analysis

Frontmatter

Medical X-ray Image Classification Method Based on Convolutional Neural Networks

Digital Breast Tomosynthesis Reconstruction Techniques in Healthcare Systems: A Review

BCAnalyzer: A Semi-automated Tool for the Rapid Quantification of Cell Monolayer from Microscopic Images in Scratch Assay

Color Hippocampus Image Segmentation Using Quantum Inspired Firefly Algorithm and Merging of Channel-Wise Optimums

Breast Cancer Histologic Grade Identification by Graph Neural Network Embeddings

A Pilot Study of Neuroaesthetics Based on the Analysis of Electroencephalographic Connectivity Networks in the Visualization of Different Dance Choreography Styles

Machine Learning in Bioinformatics and Biomedicine

Frontmatter

Ethical Dilemmas, Mental Health, Artificial Intelligence, and LLM-Based Chatbots

Cyclical Learning Rates (CLR’S) for Improving Training Accuracies and Lowering Computational Cost

Relation Predictions in Comorbid Disease Centric Knowledge Graph Using Heterogeneous GNN Models

Inter-helical Residue Contact Prediction in -Helical Transmembrane Proteins Using Structural Features

Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs

Medical Image Processing

Frontmatter

Role of Parallel Processing in Brain Magnetic Resonance Imaging

Deep Learning Systems for the Classification of Cardiac Pathologies Using ECG Signals

Transparent Machine Learning Algorithms for Explainable AI on Motor fMRI Data

A Guide and Mini-Review on the Performance Evaluation Metrics in Binary Segmentation of Magnetic Resonance Images

Next Generation Sequencing and Sequence Analysis

Frontmatter

The Pathogenetic Significance of miR-143 in Atherosclerosis Development

Comparison of VCFs Generated from Different Software in the Evaluation of Variants in Genes Responsible for Rare Thrombophilic Conditions

Uterine Cervix and Corpus Cancers Characterization Through Gene Expression Analysis Using the KnowSeq Tool

Sensor-Based Ambient Assisted Living Systems and Medical Applications

Frontmatter

Smart Wearables Data Collection and Analysis for Medical Applications: A Preliminary Approach for Functional Reach Test

Radar Sensing in Healthcare: Challenges and Achievements in Human Activity Classification & Vital Signs Monitoring

Backmatter

Premium Partner