Skip to main content
Top

2020 | Book

Bioinformatics and Biomedical Engineering

8th International Work-Conference, IWBBIO 2020, Granada, Spain, May 6–8, 2020, Proceedings

Editors: Prof. Ignacio Rojas, Prof. Olga Valenzuela, Prof. Fernando Rojas, Luis Javier Herrera, Dr. Francisco Ortuño

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This volume constitutes the proceedings of the 8th International Work-Conference on IWBBIO 2020, held in Granada, Spain, in May 2020.
The total of 73papers presented in the proceedings, was carefully reviewed and selected from 241 submissions. The papers are organized in topical sections as follows: Biomarker Identification; Biomedical Engineering; Biomedical Signal Analysis; Bio-Nanotechnology; Computational Approaches for Drug Design and Personalized Medicine; Computational Proteomics and Protein-Protein Interactions; Data Mining from UV/VIS/NIR Imaging and Spectrophotometry; E-Health Technology, Services and Applications; Evolving Towards Digital Twins in Healthcare (EDITH); High Performance in Bioinformatics; High-Throughput Genomics: Bioinformatic Tools and Medical Applications; Machine Learning in Bioinformatics; Medical Image Processing; Simulation and Visualization of Biological Systems.

Table of Contents

Frontmatter

Biomarker Identification

Frontmatter
Identification of Coding Regions in Prokaryotic DNA Sequences Using Bayesian Classification

The identification of protein-coding regions in genomic DNA sequences is a well-known problem in computational genomics. Various computational algorithms can be employed to achieve the identification process. The rapid advances in this field have motivated the development of innovative engineering methods that allow for further analysis and modeling of many processes in molecular biology. The proposed algorithm utilizes well-known concepts in communications theory, such as correlation, the maximal ratio combining (MRC) algorithm, and filtering techniques to create a signal whose maxima and minima indicate coding and noncoding regions, respectively. The proposed algorithm investigates several prokaryotic genome sequences. Two Bayesian classifiers are designed to test and evaluate the performance of the proposed algorithm. The obtained simulation results prove that the algorithm can efficiently and accurately detect protein-coding regions, which is being demonstrated by the obtained sensitivity and specificity values that are comparable to well-known gene detection methods in prokaryotes. The obtained results further verify the correctness and the biological relevance of using communications theory concepts for genomic sequence analysis.

Mohammad Al Bataineh
Identification of Common Gene Signatures in Microarray and RNA-Sequencing Data Using Network-Based Regularization

Microarray and RNA-sequencing (RNA-seq) gene expression data alongside machine learning algorithms are promising in the discovery of new cancer biomarkers. However, even though they are similar in purpose, there are some fundamental differences between the two techniques. We propose a methodology for cross-platform integration, and biomarker discovery based on network-based regularization via the Twin Networks Recovery (twiner) penalty, as a strategy to enhance the selection of breast cancer gene signatures that have similar correlation patterns in both platforms. In a classification setting based on sparse logistic regression (LR) taking as classes tumor from both RNA-seq and microarray, and normal tissue samples, twiner achieved precision-recall accuracies of 99.71% and 99.57% in the training and test set, respectively. Moreover, the survival analysis results validated the biological relevance of the signatures identified by twiner. Therefore, by leveraging from the existing amount of data for microarray and RNA-seq, a single biological conclusion can be reached, independent of each technology.

Inês Diegues, Susana Vinga, Marta B. Lopes
Blood Plasma Trophic Growth Factors Predict the Outcome in Patients with Acute Ischemic Stroke

Stroke is an acute disorder of CNS being the leading factor of mortality and disability of the population. Dynamic assessment of trophic growth factors expression is a promising tool to predict the outcome of ischemic stroke. We investigated the concentration dynamics of the brain-derived neurotrophic factor (BDNF) and vascular endothelial growth factor (VEGF) in blood plasma of patients with acute ischemic stroke. 56 patients took part in the study. Venous blood was collected from all patients on the first, 7th and 21st day of their hospital stay. BDNF and VEGF plasma concentrations were measured using ELISA. Our study shows, that not single, but serial dynamic measures of BDNF plasma concentrations in the acute period of ischemic stroke have a prognostic significance. Increasing of the BDNF plasma concentration on day 7 in comparison to the concentration on day 1 was significantly associated with a better clinical outcome of acute ischemic stroke. Extremely high VEGF plasma concentrations (more than 260 pg/mL) on days 1 and 7 from the ischemic stroke onset were significantly associated with a worse clinical outcome on day 21 and a less favorable rehabilitation prognosis. Serial measurement of plasma concentrations of trophic growth factors in patients with ischemic stroke presents a rather simple, reliable and minimally invasive method of dynamic assessment of the clinical course of acute ischemic stroke and early outcome prediction.

Valeriia Roslavtceva, Evgeniy Bushmelev, Pavel Astanin, Tatyana Zabrodskaya, Alla Salmina, Semen Prokopenko, Vera Laptenkova, Michael Sadovsky
Analyzing the Immune Response of Neoepitopes for Personalized Vaccine Design

In the last few years, the importance of neoepitopes for the development of personalized antitumor vaccines has increased remarkably. This kind of epitopes are considered to generate a strong immune reaction, while their non-mutated version, which sometimes differs only in a single amino-acid, does not generate a response at all. In order to study if, regardless the immune tolerance, neoepitopes are quantitatively more immunogenic than the original strings, we have obtained samples of mutated and non-mutated epitopes of six patients with cutaneous melanoma in different stages, and then we have compared them. More precisely, we have used several bioinformatic tools to study certain properties of the epitopes such as the HLA binding affinity of classes I and II, and found that some of them are in fact increased in their mutated versions, which supports the hypothesis, and also reinforces the use of neoepitopes for cancer vaccine design.

Iker Malaina, Leire Legarreta, Mª Dolores Boyano, Santos Alonso, Ildefonso M. De la Fuente, Luis Martinez
A Data Integration Approach for Detecting Biomarkers of Breast Cancer Survivability

We introduce a network-based approach to identify subnets of functionally-related genes for predicting 5-year survivability of breast cancer patients treated with chemotherapy, hormone therapy, and a combination of these. A gene expression dataset and a protein-protein interaction network are integrated to construct a weighted graph, where edge weight expresses the predictability of the two corresponding genes in predicting the class. We propose a scoring criterion to measure the density of a weighted sub-graph, which is also an estimation of its predictive power. Thus, we can identify an optimally-dense sub-network for each seed gene, and then evaluate that sub-network by classification method. Finally, among the sub-networks whose classification performance greater than a given threshold, we search for an optimal set of sub-networks that can further improve classification performance via a voting scheme. We significantly improved the results of existing approaches. For each type of treatment, our best prediction model can reach 85% accuracy or more. Many selected sub-networks used to construct the voting models contain breast/other cancer-related genes including SP1, TP53, MYC, NOG, and many more, providing pieces of evidence for down-stream analysis.

Huy Quang Pham, Luis Rueda, Alioune Ngom

Biomedical Engineering

Frontmatter
Effects of the Distribution in Space of the Velocity-Inlet Condition in Hemodynamic Simulations of the Thoracic Aorta

In the present paper the effects of the spatial distribution of the inlet velocity in numerical simulations of the thoracic aorta have been investigated. First, the results obtained by considering in-vivo measured inlet velocity distribution are compared with the ones obtained for a simulation having the same flow rate waveform and plug flow condition at the inlet section. The results of the two simulations are consistent in terms of flow rate waveform, but differences are present in the pressure range and in the wall shear stresses, especially in the foremost part of the ascending aorta.This motivates a stochastic sensitivity analysis on the effect of the distribution in space of the inlet velocity. This distribution is modeled through a truncated-cone shape and the ratio between the upper and the lower base is selected as the uncertain parameter. The uncertainty is propagated through the numerical model and a continuous response surface of the output quantities of interest in the parameter space can be recovered through a “surrogate” model. A stochastic method based on the generalized Polynomial Chaos (gPC) approach is used herein. The selected parameter appears to have a significant influence on the velocity distribution in the ascending aorta, whereas it has a negligible effect in the descending part. This, in turn, produces significant effects on the wall shear stresses in the ascending aorta, confirming the need of using patient-specific inlet conditions if interested in the hemodynamics and stresses of this region.

Maria Nicole Antonuccio, Alessandro Mariotti, Simona Celi, Maria Vittoria Salvetti
Coupled Electro-mechanical Behavior of Microtubules

In this contribution, the coupled electro-mechanical behavior of the microtubules has been systematically investigated utilizing a continuum-based finite element framework. A three-dimensional computational model of a microtubule has been developed for predicting the electro-elastic response of the microtubule subjected to external forces. The effects of the magnitude and direction of the applied forces on the mechanics of microtubule have been evaluated. In addition, the effects of variation of microtubule lengths on the electro-elastic response subjected to external forces have also been quantified. The results of numerical simulation suggest that the electro-elastic response of microtubule is significantly dependent on both the magnitude and direction of the applied forces. It has been found that the application of shear force results in the attainment of higher displacement and electric potential as compared to the compressive force of the same magnitude. It has been further observed that the output potential is linearly proportional to the predicted displacement and the electric potential within the microtubule. The increase in the length of microtubule significantly enhances the predicted piezoelectric potential under the application of different forces considered in the present study. It is expected that the reported findings would be useful in different avenues of biomedical engineering, such as biocompatible nano-biosensors for health monitoring, drug delivery, noninvasive diagnosis and treatments.

Sundeep Singh, Roderick Melnik
Comparison of Corneal Morphologic Parameters and High Order Aberrations in Keratoconus and Normal Eyes

The aim of this study is evaluating the influence of corneal geometry in the optical system’s aberrations, and its usefulness as diagnostic criterion for keratoconus.159 normal eyes (normal group, mean age 37.8 ± 11.6 years) and 292 eyes with the diagnosis of keratoconus (keratoconus group, mean age 42.2 ± 17.6 years) were included in this study. All eyes received a comprehensive ophthalmologic examination. A virtual 3D model of each eye was made using CAD software and different anatomical parameters related with surface and volume were measured. Statistically significant differences were found for all anatomical parameters (all p < 0.001). AUROC analysis showed that all parameters reached values above 0.7, with the exception of the total corneal surface area (TCSAA-S). In conclusion, the methodology explained in this research, that bases in anatomical parameters obtained from a virtual corneal model, allow to analyze the diagnostic value of corneal geometry correlation with optical aberrations in keratoconus pathology.

Jose Sebastián Velázquez, Francisco Cavas, Jose Miguel Bolarín, Jorge Alió
Graph Databases for Contact Analysis in Infections Using Spatial Temporal Models

Infections acquired in healthcare settings (nosocomial infections) have become one of the main health problems in acute care centers. Some of epidemiologists’ efforts are focused on studying patient’s traceability and determining the main factors that lead to its appearance. However, specialists demand new technology to ease such analysis.In this work, we explore the capacity of alternative technologies in information storage, like Graph databases (GDBs). GDBs, unlike the traditional (relational) databases present in Information Health Systems, have a remarkable expressiveness for modeling and querying highly inter-liked concepts in data-sets. In particular, we focus on the study of the advantages GDBs can offer in the analysis of contacts between patients diagnosed with a bacterial nosocomial infection in a hospital setting.The contributions of our research are the following: a design and implementation of the domain has been carried out, with the ability to model any hospital architectural structure on several levels, as well as represent the clinical events associated with patients, thus contemplating a spatial and temporal modeling. Finally, we study the query expressiveness and performance for the analysis of contacts in infection spread.

Lorena Pujante, Manuel Campos, Jose M. Juarez, Bernardo Canovas-Segura, Antonio Morales
Window Functions in Rhythm Based Biometric Authentication

Emerging new technologies and new threads entail additional security; therefore, biometric authentication is the key to protecting the passwords by classifying unique biometric features. One of the protocols recently proposed is the rhythm-based authentication dealing with the dominant frequency components of the keystroke signals. However, frequency component itself is not enough to understand the whole keystroke sequence; therefore, the characteristic of a password could only be analyzed by transformations providing the information of when which dominant frequency arises. In this paper, the biometric signal generated from keystroke data is divided into windows by various window functions and sizes for frequency and time localization. As a guide for signal processing in biometrics and biomedicine, we compared Hamming and Blackman widow functions with various sizes in short time Fourier transformations of the signals and found out that Blackman is more appropriate for biometric signal processing.

Orcan Alpar, Ondrej Krejcar
Relationships Between Muscular Power and Bone Health Parameters in a Group of Young Lebanese Adults

The aim of the current study was to explore the relationships between lower limb muscular power and bone variables (bone mineral content (BMC), bone mineral density (BMD), hip geometry indices and trabecular bone score (TBS)) in a group of young Lebanese adults. 29 young Lebanese men and 31 young Lebanese women whose ages range between 18 and 32 years participated in this study. Body weight and height were measured, and body mass index (BMI) was calculated. Body composition and bone variables were measured by DXA. DXA measurements were completed for the whole body (WB), lumbar spine (L2–L4), total hip (TH) and femoral neck (FN). Hip geometry parameters including cross-sectional area (CSA), cross-sectional moment of inertia (CSMI), section modulus (Z), strength index (SI) and buckling ratio (BR) were derived by DXA. Trabecular bone score was also derived by DXA. Horizontal jump (HJ), vertical jump, vertical jump maximum power, force-velocity maximum power and 20-m sprint performance were measured or calculated by using validated fitness tests. In men, fat mass percentage was negatively correlated to TH BMD, FN BMD, CSA, CSMI, Z and SI. In women, weight, BMI, lean mass and fat mass were positively correlated to WB BMC, CSMI and Z. Regarding physical performance variables, horizontal jump performance and force-velocity maximal power were positively correlated to TH BMD, FN BMD, CSA and Z in men. Vertical jump maximal power was positively correlated to WB BMC in women. 20-m sprint performance was negatively correlated to FN BMD, CSA, Z and SI in men. In conclusion, the current study suggests that force-velocity maximum power is a positive determinant of BMD and hip geometry indices in men but not in women.

Patchina Sabbagh, Pierre Kamlé, Antonio Pinti, Georgette Farah, Hayman Saddick, Eddy Zakhem, Boutros Finianos, Gautier Zunquin, Georges Baquet, Rawad El Hage

Biomedical Signal Analysis

Frontmatter
Thermal Behavior of Children During American Football Sports Training

This paper shows the thermal behavior in the nose, the fingertips and lower limbs (right and left) before and after an American football training in fourteen children who are members of a private football academy. During the one-hour training, four different activities were carried out: warm-up, speed test at 40 yards, long jump tests and three-cone drill tests, in which each stage was developed in a time of fifteen minutes. For the statistical analysis of the thermal information, a methodology that allows to calculate the thermal matrix and thus obtain the thermal values of each point was developed. Once the analysis was carried out, significant changes (temperature decrease) were found in the analyzed regions of interest, except for the thumb. Subsequently, a temperature decrease index was obtained and it was found that the greatest temperature change is shown in the area of the lower extremities.

Irving A. Cruz-Albarrán, Pierre Burciaga-Zuñiga, Ma. Guadalupe Perea-Ortiz, Luis A. Morales-Hernandez
Positioning Algorithm for Arterial Blood Pressure Pneumatic Sensor

The paper is devoted to the algorithmic solution of quality control over the pneumatic blood pressure sensor positioning. Previously, this problem was solved by the operator based on his subjective assessment of the presence/absence of a pulse wave in the observed signal and its quality estimation. Recent studies have led us to a simple algorithm for automatically evaluating the accuracy of positioning. The algorithm is based on the value of the variability of the intervals of side peaks that the multiscale autocorrelation function can manifest as part of its structure. Since this value is closely related to such a characteristic of the signal as its quasi–periodicity, the algorithm essentially estimates the degree of periodicity of the signal, which is high in the presence of a pulse wave and small in its absence. In addition to the general principle of quasi–periodicity estimation much empirical information has been accumulated on the necessary preliminary normalization of the signal, the censorship of the side peaks to be considered, on the numerical values of the comparison thresholds, etc. The main ideas of the algorithm are illustrated by examples of processing real data obtained by positioning the developed pneumatic sensor.

Viacheslav Antsiperov, Gennady Mansurov
An Approach to Detecting and Eliminating Artifacts from the Sleep EEG Signals

The objective of our ongoing work is to develop an algorithm for detecting and eliminating artifacts from the EEG polysomnographic signals thus helping practitioners in their diagnostic. The EEG signals play an important role in the identification of brain activity and thus in the sleep stage classification. However, it is well known that the recorded EEG signals may be contaminated with artifacts that affect the analysis of EEG signal. Our short paper proposes methods for detecting and eliminating non-physiological and physiological artifacts using filtering for the first and a mixed method based on ICA and wavelets for the second.

Rym Nihel Sekkal, Fethi Bereksi-Reguig, Nabil Dib, Daniel Ruiz-Fernandez

Bio-Nanotechnology

Frontmatter
New Genomic Information Systems (GenISs): Species Delimitation and IDentification

Genomic Information Systems (GenISs) have been recently proposed to provide a universal framework for feature extraction, dimensionality reduction and more effective processing of genomic data. They are based on methodologies more anchored in biochemical reality and exploit newly discovered structure of DNA spaces to extract and represent genomic data in compact data structures rich enough to answer critical questions about the original organisms, including phylogenies, species identification and, more recently, phenotypic information. They work from just DNA sequence alone (possibly including full genomes), in a matter of minutes or hours, and produce answers consistent with well-established and accepted biological knowledge. Here, we introduce a second family of GenISs based on further structural properties of DNA spaces and demonstrate that they could also be used to provide principled, general and intuitive solutions to fundamental questions in biology such as “What exactly is a biological species?” Current answers to these all important questions have remained dependent on specific taxa and subject to analyst choices. We further discuss other applications to be explored in the future, including universal biological taxonomies in the quest for a truly universal and comprehensive “Atlas of Life”, as it is or as it could be on earth.

Sambriddhi Mainali, Max H. Garzon, Fredy A. Colorado
Production of 3D-Printed Tympanic Membrane Scaffolds as a Tissue Engineering Application

In recent years, scaffolds produced in 3D printing technology have become more widespread tool due to providing more advantages than traditional methods in tissue engineering applications. In this research, it was aimed to produce patches for the treatment of tympanic membrane perforations which caused significant hearing loss by using 3D printing method.Polylactic acid (PLA) scaffolds with Chitosan (CS) added in various ratios were prepared for artificial eardrum patches. Different amounts of CS added to PLA to obtain more biocompatible scaffolds. The created patches were designed by mimicking the thickness of the natural tympanic membrane thanks to the precision provided by the 3D printed method. The produced scaffolds were analyzed separately for physical, chemical, morphological, mechanical and biocompatibility properties. Human adipose tissue-derived mesenchymal stem cells (hAD-MSCs) were used for cell culture study to analyze the biocompatibility properties. 15 wt% PLA was chosen as the control group. Scaffold containing 3 wt% CS demonstrated significantly superior and favorable features in printing quality. The study continued with these two scaffolds (15PLA and 15PLA/3CS). This study showed that PLA and PLA/CS 3D printed scaffolds are a potential application for repairing tympanic membrane perforation.

Elif Ilhan, Songul Ulag, Ali Sahin, Nazmi Ekren, Osman Kilic, Faik Nuzhet Oktar, Oguzhan Gunduz
Controlled Release of Metformin Loaded Polyvinyl Alcohol (PVA) Microbubble/Nanoparticles Using Microfluidic Device for the Treatment of Type 2 Diabetes Mellitus

Nowadays it became obvious that a relentless increase in Type 2 diabetes mellitus (T2DM), affecting the economically affluent countries, is gradually afflicting also the developing world. The currently used drugs in the treatment of T2DM have inefficient glucose control and carry serious side effects. In this study, nano-sized uniform particles were produced by microfluidic method by the explosion of microbubbles. Morphological (SEM), molecular interactions between the components (FT-IR), drug release test by UV spectroscopy measurement were carried out after production process. When microbubbles and nanoparticles, optical microscope and SEM images obtained were examined, it was observed that metformin was successfully loaded into nanoparticles. The diameter of the microbubbles and nanoparticles was 104 ± 91 µm and 116 ± 13 nm, respectively. Metformin was released in a controlled manner at pH 1.2 for 390 min. It is promising in the treatment of T2DM with the controlled release ability of metformin loaded nonoparticles.

Sumeyye Cesur, Muhammet Emin Cam, Fatih Serdar Sayın, Sena Su, Oguzhan Gunduz
Patch-Based Technology for Corneal Microbial Keratitis

Corneal opacities, which happened mainly due to microbial keratitis, are the fourth cause of blindness worldwide. Antimicrobial therapy is an alternative solution for microbial keratitis caused by Staphylococcus aureus and Pseudomonas Aeruginosa. The aim of this study, to develop patches for the treatment of corneal keratitis which caused significant corneal blindness by using electrospinning method. Polyvinyl-alcohol (PVA) patches with Gelatine (GEL) studied in various ratios. Different amounts of gelatine added to PVA to resemble the collagen fibril structure of the cornea. To enable the patches to the antimicrobial effect against the bacterias, the special plant extract was used. The produced corneal patches were examined separately for chemical, morphological, and antimicrobial properties. Scanning electron microscope (SEM), Fourier-transform infrared (FT-IR) spectroscopy were performed to observe the surface morphology and chemical structure of the patches, respectively.

Songul Ulag, Elif Ilhan, Burak Aksu, Mustafa Sengor, Nazmi Ekren, Osman Kilic, Oguzhan Gunduz

Computational Approaches for Drug Design and Personalized Medicine

Frontmatter
MARCO Gene Variations and Their Association with Cardiovascular Diseases Development: An In-Silico Analysis

Cardiovascular diseases (CVDs) represent the leading cause of morbidity and mortality in both developed and developing countries. They have complex etiology, influenced by several risk factors including the genetic component. The genetic variations were shown to be highly associated with different CVD forms, in this objective we proceeded to analyze the Macrophage Receptor with Collagen structure gene (MARCO), we performed an in-silico study with a genomic functional analysis, to evaluate the mutations’ effects on the proteins’ structures and functionalities. Indeed, we used dbSNP to retrieve single nucleotide polymorphisms (SNPs) of MARCO gene. We proceeded then to a filtration and a stability analysis using several bioinformatics tools to evaluate the most deleterious variations. Moreover we predicted the 3D structures of the encoded proteins by MARCO gene, which was validated using PROCHECK. Then we analyzed and visualize the proteins’ 3D structures.The extraction of the human MARCO gene SNPs revealed that dbSNP contains more than 14000 SNPs. The filtration process revealed the variations G241V and G262W to be the most deleterious SNPs, indeed, I-Mutant and DUET showed decreased protein stability. The validation using PROCHECK revealed a total of 89.9% MARCO protein residues to be in the favored region.As conclusion, our results let suggesting that G241V and G262W variations can cause alteration in the proteins’ structures and functions. Hence, to improve the health management, screening precariously these variants, can be useful as model for CVD diagnosis and helpful in pharmacogenomics.

Kholoud Sanak, Maryame Azzouzi, Mounia Abik, Fouzia Radouani
Computational Approaches for Drug Design: A Focus on Drug Repurposing

With the continuous advancements of biomedical instruments and the associated ability to collect diverse types of valuable biological data, numerous recent research studies have been focusing on how to best extract useful information from the Big Biomedical Data currently available. While drug design has been one of the most essential areas of biomedical research, the drug design process for the most part has not fully benefited from the recent explosive growth of biological data and bioinformatics tools. With the significant overhead associated with the traditional drug design process in terms of time and cost, new alternative methods, possibly based on computational approaches, are very much needed to propose innovative ways to propose effective drugs and new treatment options. Employing advanced computational tools for drug design and precision treatments has been the focus of many research studies in recent years. For example, drug repurposing has gained significant attention from biomedical researchers and pharmaceutical companies as an exciting new alternative for drug discovery that benefits from the computational approaches. Molecular profiling of diseases can be used to design customised treatments and more effective approaches can be explored based on the individuals’ genotype. With the newly developed Bioinformatics tools, researchers and clinicians can repurpose existing drugs and propose innovative therapies and precision treatment options. This new development also promises to transform healthcare to focus more on individualized treatments, precision medicine and lower risks of harmful side effects. In particular, this potential new era in healthcare presents transformative opportunities to advance treatments for chronic and rare diseases.

Suyeon Kim, Ishwor Thapa, Farial Samadi, Hesham Ali

Computational Proteomics and Protein-Protein Interactions

Frontmatter
Comorbidity Network Analyses of Global Rheumatoid Arthritis and Type 2 Diabetes Reveal IL2 & IL6 as Common Role Players

Comorbidities are associated with harder clinical management, worse health outcomes and an overall increase in healthcare expenditure. Here, we present a novel method of finding the common key genes and pathways via comorbidity network analyses. Essentially, we deployed data from the RAvariome database and Type 2 Diabetes Knowledge Portal for mutually exclusive interpopulation RA and T2D susceptibility genes, respectively. Protein interactomes (PIN) are built by mapping direct interactions between the above gene products and their interacting partners, along with a comorbid network combining both RA and T2D PIN. Network centrality analyses of all PIN projected 18 overlapping proteins with IL-6 and IL-2 being the common key role players found in the comorbid PIN, despite being exclusive to our curated RA susceptible gene list. Subsequent pathway analyses revealed the involvement of cellular senescence, MAPK and AGE-RAGE signalling in diabetic complications. We conclude that RA and T2D susceptible genes do not necessarily translate into indispensable proteins in their induced individual or comorbid diseased networks, but those of RA can outcompete T2D susceptible genes despite the much larger T2D component in the comorbid network. Our method is a unique approach to find key genes/proteins and implicated pathways in disease comorbidities.

Tuck Onn Liew, Rohit Mishra, Chandrajit Lahiri
Variant Analysis from Bacterial Isolates Affirms DnaK Crucial for Multidrug Resistance

Next-generation sequencing and associated computational analyses have become powerful tools for comparing the whole genomes and detecting the single nucleotide polymorphisms (SNPs) within the genes. In our study, we have identified specific mutations within the plausible drug resistant genes of eight multidrug resistant (MDR) bacterial species. Essentially, we have unearthed few proteins, involved in folding and enabling survival under stress, to be the most crucial ones from the network of the whole genome protein interactome (PIN) of these species. To confirm the relevance of these proteins to antibiotic resistance, variant analyses were performed on all the selected MDR species, isolated from patients’ samples in PATRIC database, against their respective reference genomes. The SNPs found in the patient isolates revealed the nucleotide changes from C to A on DnaK, thereby altering a single amino acid change that might lead to misfolding of proteins. Thus, we propose DnaK to be the best characterized bacterial chaperone having implications in multidrug resistance. To this end, to provide an alternative solution to tackle MDR, docking studies were performed with a phenaleno-furanone derivative which revealed the highest binding energy and inhibition against DnaK.

Shama Mujawar, Amr Adel Ahmed Abd El-Aal, Chandrajit Lahiri
Topological Analysis of Cancer Protein Subnetwork in Deubiquitinase (DUB) Interactome

Ubiquitination pathway regulates many cellular events that underlie the development of various cancer types. This led to tremendous interest in exploration of cancer therapeutics potential among the ubiquitination components, the E1, E2, E3 and deubiquitinase (DUB). Approximately 101 DUBs are encoded in human cell and often, studies on the DUBs were performed individually. Therefore, this study is conducted to observe the peculiarities of cancer protein subnetwork within DUB interactome, aiming to increase understanding on the relationship between DUBs and cancer from system biology point of view. To construct the DUB interactome, proteins associated with DUBs were extracted from IMEx consortium database and the interaction network were visualized in Cytoscape. Cancer protein nodes were identified according to the list from COSMIC Cancer Gene Census database and were extracted to form a subnetwork of 247 nodes and 326 edges. Some DUBs such as BAP1, TNFAIP3, USP6, CYLD and USP44 are observed to be the cancer proteins themselves and 78 DUBs have direct association with cancer related proteins. Topological analysis by NetworkAnalyzer and CentiScaPe suggested that OTUB1, COPS5 and USP7 have the strongest characteristics, indicating that these DUBs must have important roles in cancer-related pathways. Comparison with essential protein subnetwork suggested that the cancer protein subnetwork tends to have weaker clustering coefficient, lower betweenness centrality and higher closeness centrality. Overall, it could be said that the topological analysis of cancer protein subnetwork in DUB interactome interpreted from this study helps to provide a deeper understanding on the biological significance of DUBs in cancer.

Nurulisa Zulkifle
Graph Based Automatic Protein Function Annotation Improved by Semantic Similarity

Functional annotation of protein is a very challenging task primarily because manual annotation requires a great amount of human efforts and still it’s nearly impossible to keep pace with the exponentially growing number of protein sequences coming into the public databases, thanks to the high throughput sequencing technology. For example, the UniProt Knowledge-base (UniProtKB) is currently the largest and most comprehensive resource for protein sequence and annotation data. According to the November, 2019 release of UniProtKB, some 561,000 sequences are manually reviewed but over 150 million sequences lack reviewed functional annotations. Moreover, it is an expensive deal in terms of the cost it incurs and the time it takes. On the contrary, exploiting this huge quantity of data is important to understand life at the molecular level, and is central to understanding human disease processes and drug discovery. To be useful, protein sequences need to be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology (GO) terms. The ability to automatically annotate protein sequences in UniProtKB/TrEMBL, the non-reviewed UniProt sequence repository, would represent a major step towards bridging the gap between annotated and un-annotated protein sequences. In this paper, we extend a neighborhood based network inference technique for automatic GO annotation using protein similarity graph built on protein domain and family information. The underlying philosophy of our approach assumes that proteins can be linked through the domains, families, and superfamilies that they share. We propose an efficient pruning and post-processing technique by integrating semantic similarity of GO terms. We show by empirical results that the proposed hierarchical post-processing potentially improves the performance of other GO annotation tools as well.

Bishnu Sarker, Navya Khare, Marie-Dominique Devignes, Sabeur Aridhi

Data Mining from UV/VIS/NIR Imaging and Spectrophotometry

Frontmatter
Cancer Detection Based on Image Classification by Using Convolution Neural Network

Breast cancer starts when cells in the breast begin to grow out of control. These cells usually form a tumor that can often be seen on an x-ray or felt as a lump. The tumor is malignant (cancer) if the cells can grow into (invade) surrounding tissues or spread (metastasize) to distant areas of the body. The challenge of this project was to build an algorithm by using a neural network to automatically identify whether a patient is suffering from breast cancer by looking at biopsy images. The algorithm must be accurate because the lives of people are at stake.

Mohammad Anas Shah, Abdala Nour, Alioune Ngom, Luis Rueda
Steps to Visible Aquaphotomics

This article discusses one of the most popular methods for studying water - spectrophotometry. Already known as a fact that water is strongly absorbing at most of the wavelengths in the electromagnetic spectrum. The water molecule, in the gaseous state, has three types of transition that can give rise to absorption of electromagnetic radiation: rotational transitions, vibrational transitions and electronic transitions. In liquid water the rotational transitions are effectively quenched, but absorption bands are affected by hydrogen bonding. In this paper, we mainly consider the behavior of the spectral characteristics of liquid water in the visible range of the spectrum. The results discussed in this article show the influence of the path length on the water spectral characteristics in the visible range.

Vladyslav Bozhynov, Oleksandr Mashchenko, Pavla Urbanova, Zoltan Kovacs
Automatic Calibration, Acquisition, and Analysis for Color Experiments

This article is presenting a device for color experiments with automatic setup. The device is designed to carry on measurements of color of semi-homogenous objects. The light conditions are controlled by mikroPc and standardized for the experiments. The capturing camera setup and calibrations are done automatically using Gray World theory for white balance and defined background. The calibration is adjusted by brightness and contrast calibration, and difference between black and white versus light and dark. The device is connected with the software for automatic color analysis. The object is segmented from the background and transformed into several color space representations. The color channel distributions are statistically evaluated.The device should serve in the experiment with color analysis, stability, changes, comparison in fish or food production. Moreover, extension of the analysis cold be used in biometric or diseases tasks.

Jan Urban
Classification of Fish Species Using Silhouettes

The classification of the fish silhouettes allows a quick decision of the fish species presence and amount in the given scene. The classical approach of the machine learning is used to test the question of linear separability of fish species silhouettes classes. The preprocessing of images consisted of object to background segmentation and image registration. The classificator is trained using modified Rosenblatt algorithm for loss function of discriminant analysis.This article is disseminating the preliminary results of training and testing of six fish species classification. The images were of different quality and light conditions. The classificator with the possibility to undecide is introduced and compared. The results are discussed from the point of view of usability of classical methods, preprocessing conditioning, and parametrization of loss function.

Pavla Urbanova, Vladyslav Bozhynov, Petr Císař, Miloš Železný

E-Health Technology, Services and Applications

Frontmatter
Spa-neg: An Approach for Negation Detection in Clinical Text Written in Spanish

Electronic health records contain valuable information written in narrative form. A relevant challenge in clinical narrative text is that concepts commonly appear negated. Several proposals have been developed to detect negation in clinical text written in Spanish. Much of these proposals have adapted the Negex algorithm to Spanish, but obtained results indicating lower performance than NegEx implementations in other languages. Moreover, in most of these proposals, the validation process could be improved using a shared test corpus focused on negation in clinical text. This paper proposes Spa-neg, an approach to improve negation detection in clinical text written in Spanish. Spa-neg combines three elements: (i) an exploratory data analysis of how negation is written in the clinical text, (ii) use of regular expressions best adapted to the way in which negation is expressed in Spanish, (iii) experiments, and validation using a shared annotated corpus focused on negation. Our findings suggest that the combination of these elements improves the process of negation detection. The tests performed have shown 92% F-Score using IULA Spanish, an annotated corpus for negation in clinical text.

Oswaldo Solarte-Pabón, Ernestina Menasalvas, Alejandro Rodriguez-González
In-Bed Posture Classification from Pressure Mat Sensors for the Prevention of Pressure Ulcers Using Convolutional Neural Networks

Due to the current population aging around the world, it is a fact that a good amount of the technology should be focused on the care of these people, improving their living conditions. In this work, we propose a methodology to classify in-bed human posture using pressure mat sensors for the prevention of pressure ulcers. First, we provide a visual representation using fuzzy processing from raw pressure data to grayscale. Second, we enable the generation of a large dataset from a limited dataset using ad hoc data augmentation, generating new synthetic sleeping positions. Third, we define 2 CNN models to evaluate the impact of layers on the performance of in-bed posture classification. The results show an encouraging performance in a small dataset using a leave-one-participant-out cross-validation.

Aurora Polo Rodríguez, David Gil, Chris Nugent, Javier Medina Quero
Pin-Code Authentication by Local Proximity Based Touchstroke Classifier

Most of the recently released touchscreen devices enable fingerprint authentication; while pin-codes connected to the fingerprints are still in common usage. Among various biometric feature extractors, such as time-based, frequency-based or even pressure-based, the most reliable and implementable with mathematical infrastructure is location-based approach for enhancing security of touchscreen devices. Therefore, in this paper, we propose fundamentals of a novel feature extraction protocol to strengthen pin-codes by calculating local proximity of the touches on the screen without long training sessions. We presented a fuzzy-like area methodology for finding outputs for each input and also conducted experiments to show the discretization of the outputs per real and fraud attempts.

Orcan Alpar, Ondrej Krejcar
Behavioral Risk Factors Based Cancer Prediction Model Utilizing Public and Personal Health Records

Cancer has become one of the major public health problems worldwide while the etiology of cancer remains largely unknown. Even with advanced cancer research, prediction of it has not been as practical as those of other chronic diseases such as diabetes. However, behavioral risk factors (BRF) may be used to monitor and predict cancer diseases. Such factors can be used to demonstrate and educate the general public on potential cancer pathways and possible outcomes. For that purpose, we have analyzed public health data and personal health records and discovered some histological features related to behavioral risk factors in the cancer diseases. Such characteristics are then used to assess each individual’s BRFs for classification and prognosis of the diseases. It plays a key role in many health-related realms, including increasing patient awareness on BRFs, developing related medical procedures, the way of handling patients’ records and the treatment of chronic diseases. In this paper, we presented predictive analysis based on some BRFs that might contribute to increase the chances of developing cancer disease.

Emil Saweros, Yeong-Tae Song
IoMT-Driven eHealth: A Technological Innovation Proposal Based on Smart Speakers

Internet of Medical Things (IoMT) is a technological concept applied in healthcare contexts to achieve the digital interconnection of everyday objects with the Internet in order to make life easier for people. IoMT can help monitor, inform and notify not only caregivers, but provide healthcare providers with actual data to identify issues before they become critical or to allow for earlier invention. In this sense, this paper is contextualized in Assisted Reproduction Treatment (ART) processes to reduce the number of hospital visits, reduce healthcare costs and improve patientcare, as well as the productivity of the healthcare professional. So, we present an IoMT-based technological proposal to manage and control the prescription of pharmacological treatments to patients who are carried out ART processes. In this context, we propose the integration of iMEDEA (modular system specialized in the management of electronic clinical records for ART unit) and smart speaker devices (specifically, Amazon’s Alexa), as well as the validation of our proposal in the real environment offered by Inebir clinic.

David Domínguez, Leticia Morales, Nicolas Sánchez, Jose Navarro-Pando

Evolving Towards Digital Twins in Healthcare (EDITH)

Frontmatter
Epileptic Seizure Detection Using a Neuromorphic-Compatible Deep Spiking Neural Network

Monitoring brain activities of Drug-Resistant Epileptic (DRE) patients is crucial for the effective management of the chronic epilepsy. Implementation of machine learning tools for analyzing electrical signals acquired from the cerebral cortex of DRE patients can lead to the detection of a seizure prior to its development. Therefore, the objective of this work was to develop a deep Spiking Neural Network (SNN) for the epileptic seizure detection. The energy and computation-efficient SNNs are well compatible with neuromorphic systems, making them an adequate model for edge-computing devices such as healthcare wearables. In addition, the integration of SNNs with neuromorphic chips enables the secure analysis of sensitive medical data without cloud computations.

Pouya Soltani Zarrin, Romain Zimmer, Christian Wenger, Timothée Masquelier
Anonymizing Personal Images Using Generative Adversarial Networks

This paper introduces a first approach on using Generative Adversarial Networks (GANs) for the generation of fake images, with the objective of anonymizing patients information in the health sector. This is intended to create valuable images that can be used both, in educational and research areas, while avoiding the risk of a sensitive data leakage. For this purpose, firstly a thorough research on GAN’s state of the art and available databases has been developed. The outcome of the research is a GAN system prototype adapted to generate personal images that imitates provided samples. The performance of this prototype has been checked and satisfactory results have been obtained. Moreover, a novel research pathway has been opened so further research can be developed.

Esteban Piacentino, Cecilio Angulo
Generating Fake Data Using GANs for Anonymizing Healthcare Data

EDITH is a project aiming to orchestrate an ecosystem of manipulation of reliable and safe data, applied to the field of health, proposing the creation of digital twins for personalised healthcare. This paper elaborates on a first approach about using Generative Adversarial Networks (GANs) for the generation of fake data, with the objective of anonymizing users information in the health sector. This is intended to create valuable data that can be used both, in educational and research areas, while avoiding the risk of a sensitive data leakage. Meanwhile GANs are mainly exploited on images and video frames, we are proposing to process raw data in the form of an image, so it can be managed through a GAN, then decoded back to the original data domain. The performance of this prototype has been demonstrated. Moreover, a novel research pathway has been opened so further developments are expected.

Esteban Piacentino, Cecilio Angulo
A Proposal to Evolving Towards Digital Twins in Healthcare

The main objective in this proposal is to orchestrate an ecosystem of manipulation of reliable and safe data, applied to the field of health, specifically lung cancer, by introducing the creation of digital twins for personalised healthcare about the behaviour of this disease on patients. Digital twins is a very popular and novel approach in digitisation units in industry which will be used by both kind of experts: (i) data analysts, who will design expert recommender systems and extract knowledge – explainable Artificial Intelligence (AI); and (ii) professionals in medicine, who will consume that knowledge generated with their research for better diagnosis. This knowledge generation/extraction process will work in the form of a lifelong learning system by iterative and continuous use. The produced software platform will be abstracted so it can be applied like a general purpose service tool in other domains of knowledge, specially health and industry. Furthermore, a rule extraction module will be made available for explainability issues.

Cecilio Angulo, Luis Gonzalez-Abril, Cristóbal Raya, Juan Antonio Ortega

High Performance in Bioinformatics

Frontmatter
Role of Homeobox Genes in the Development of Pinus Sylvestris

Comprehensive gene expression profiling of homeobox gene family members allows to retrieve the role in Pinus sylvestris growth and development. Homeobox genes encode transcriptional factors playing important role in the development of organism. Homeodomains are common in a vast amount of species. Therefore, they can be identified even in non-model organisms. Understanding of homeobox genes functions supports the investigation of tissues development and yields the ways to regulate it. Homeobox genes are understudied for Scots pine. Hence, we assembled de novo transcriptome of Pinus sylvestris obtained from five tissues. The transcriptome comprises 775 502 transcripts. 243 homeobox-containing transcripts were found and DE analysis was carried out using these sequences. We have obtained 5 clusters of homeobox DE genes (visualized as a heatmap of gene expression). DE genes were annotated. The obtained results give some insights into the development of bud and mature tissues of Pinus sylvestris.

Tatiana Guseva, Vladislav Biriukov, Michael Sadovsky
Function vs. Taxonomy: Further Reading from Fungal Mitochondrial ATP Synthases

We studied the relations between triplet composition of the family of mitochondrial atp6, atp8 and atp9 genes, their function, and taxonomy of the bearers. The points in 64-dimensional metric space corresponding to genes have been clustered. It was found the points are separated into three clusters corresponding to those genes. 223 mitochondrial genomes have been enrolled into the database.

Victory Fedotovskaya, Michael Sadovsky, Anna Kolesnikova, Tatiana Shpagina, Yulia Putintseva
Discovering the Most Characteristic Motif from a Set of Peak Sequences

Chromatin immunoprecipitation experiments and the subsequent sequencing of these fragments (ChIP-Seq) are subject to great uncertainty, due to execution errors, technical and calculation limitations and the inherent complexity of the biological systems to be studied. Therefore, one of the challenges that researchers face when analyzing the results of ChIP-Seq experiments is to elucidate the pattern behind the obtained sequences (peaks), facing a huge amount of data and noise. A significant amount of statistical tools and algorithms have been proposed to solve this issue in the last years. The method presented in this paper innovates by taking advantage of both the structure of the data obtained in these experiments (peaks) and the existing resources. The motif or pattern obtained by this procedure from these peaks is considered the most characteristic motif. This method also allows to obtain the quality metrics of the analyzed experiment. The method has been validated with data retrieved from public repositories.

Ginés Almagro-Hernández, Jesualdo Tomás Fernández-Breis
An Overview of Search and Match Algorithms Complexity and Performance

DNA data provide us a considerable amount of information regarding our biological data, necessary to study ourselves and learn about variant characteristics. Even being able to extract the DNA from cells and sequence it, there is a long way to process it in one step.Over past years, biologists evolved attempting to “decipher” the DNA code. Keyword search and string matching algorithms play a vital role in computational biology. Relationships between sequences define the biological functional and structural of the biological sequences. Finding such similarities is a challenging research area, comprehending BigData, that can bring a better understanding of the evolutionary and genetic relationships among the genes. This paper studied and analyzed different kinds of string matching algorithms used for biological sequencing, and their complexity and performance are assessed.

Maryam Abbasi, Pedro Martins
Highly Parallel Convolution Method to Compare DNA Sequences with Enforced In/Del and Mutation Tolerance

New error tolerant method for the comparison and analysis of symbol sequences is proposed. The method is based on convolution function calculation, where the function is defined over the binary numeric sequences obtained by the specific transformation of original symbol sequence. The method allows highly parallel implementation and is of great value for insertion/delition mutations search. To calculate the convolution function, fast Fourier transform is used in the method implementation.

Anna Molyavko, Vladimir Shaidurov, Eugenia Karepova, Michael Sadovsky
A Mini Review on Parallel Processing of Brain Magnetic Resonance Imaging

Parallel processing is an execution of processes that make computation and calculation on many things simultaneously. In addition, parallel processing methods are applied extensively to the examination of MR imaging in treatment. As parallel computer systems become larger and faster at the present time; scientists, researchers and engineers are eventually able to find solutions to the problems in medicine, which had been taken too long to run before. Therefore, various fields including medicine and bioinformatics have already taken the advantages of parallel processing. In this review study, we deal with analyzing key concepts and eminent parallel processing methods that have been used to analyze the brain MRI images. In addition to this, we indicate great number of examples from the current literature in a comprehensive literature matrix. Based on the literature matrix that is created according to the Web of Science analysis, information graphics are presented in a comprehensive manner. As a result, parallel processing methods in brain magnetic resonance imaging offer powerful replacements to computer clusters in order to run large, disseminated solicitations.

Ayca Kirimtat, Ondrej Krejcar, Rafael Dolezal, Ali Selamat
Watershed Segmentation for Peak Picking in Mass Spectrometry Data

Mass spectrometry with gas chromatography is one of the emerging high-resolution instruments. This technology can be used to discover the composition of the chemical compounds. It is used for targeted detection or for untargeted screening. As such, this technology is providing a large volume of measurements. These data are also in high precision. There are emerging need to efficiently process these data and be able to identify and extract all possible information. There are numerous tools to do that, using common steps. One of the steps is peak picking, usually carried by signal processing methods. We are proposing a two-dimensional approach to identify the peaks and extract their features for further analysis. This method can be easily adaptable to fit the current pipelines and to perform the computation efficiently. We are proposing a method to preprocess the data onto a grid of required precision. After that, we are applying an image processing method watershed, to extract the region of interest and the peaks.

Vojtěch Bartoň, Markéta Nykrýnová, Helena Škutková

High-Throughput Genomics: Bioinformatic Tools and Medical Applications

Frontmatter
LuxHS: DNA Methylation Analysis with Spatially Varying Correlation Structure

Bisulfite sequencing (BS-seq) is a popular method for measuring DNA methylation in basepair-resolution. Many BS-seq data analysis tools utilize the assumption of spatial correlation among the neighboring cytosines’ methylation states. While being a fair assumption, most existing methods leave out the possibility of deviation from the spatial correlation pattern. Our approach builds on a method which combines a generalized linear mixed model (GLMM) with a likelihood that is specific for BS-seq data and that incorporates a spatial correlation for methylation levels. We propose a novel technique using a sparsity promoting prior to enable cytosines deviating from the spatial correlation pattern. The method is tested with both simulated and real BS-seq data and compared to other differential methylation analysis tools.

Viivi Halla-aho, Harri Lähdesmäki
Unravelling Disease Presentation Patterns in ALS Using Biclustering for Discriminative Meta-Features Discovery

Amyotrophic Lateral Sclerosis (ALS) is a heterogeneous neurodegenerative disease with a high variability of presentation patterns, impacting patient care and survival. Given the heterogeneous nature of ALS patients and targeting a better prognosis, clinicians usually estimate disease progression at diagnosis using the rate of decay computed from the Revised ALS Functional Rating Scale (ALSFRS-R). In this context, we aim at unravelling disease presentation patterns by proposing a new Biclustering-based approach, termed Discriminative Meta-features Discovery (DMD). These patterns (Meta-features) are composed of discriminative subsets of features together with their values, allowing them to distinguish and characterize subgroups of patients with similar disease presentation patterns. The proposed methodology was used to characterize groups of ALS patients with different progression rates (Slow, Neutral and Fast) using Biclustering-based Classification and Class Association Rule Mining. The patterns found for each of the three progression groups (described either as important features used by a Random Forest or as interpretable Association Rules) were validated by ALS expert clinicians, who were able to recognize relevant characteristics of slow, neutral and fast progressing patients. These results suggest that our general Biclustering approach is a promising way to unravel disease presentation patterns and can be applied to similar problems and other diseases.

Joana Matos, Sofia Pires, Helena Aidos, Marta Gromicho, Susana Pinto, Mamede de Carvalho, Sara C. Madeira
Patient Stratification Using Clinical and Patient Profiles: Targeting Personalized Prognostic Prediction in ALS

Amyotrophic Lateral Sclerosis (ALS) is a severe neurodegenerative disease with highly heterogeneous disease presentation and progression patterns. This hampers effective treatments targeting all patients and finding a cure is still a challenge. In this scenario, patient stratification is believed to be a key tool to deal with the heterogeneous nature of the disease, promoting the discovery of more homogeneous groups of patients, that can then be used to improve patient prognosis and care. In this work, we propose to use clustering to stratify patient observations in accordance with clinically defined subsets of features (Clinical Profiles). The groups obtained by clustering patients using the Clinical Profiles are called Patient Profiles. Each patient profile is then used to learn specialized prognostic models to predict the need for Non-Invasive Ventilation (NIV) within a time window of 90 days. Each patient profile specific prognostic model is then used in ensemble learning. We used three clinical profiles (prognostic, respiratory and functional) based on complementary clinically relevant views of disease presentation and progression. These clinical profiles yielded two, four, and two patient profiles, respectively. The specialized prognostic models learned from these clinical and patient profiles show overall improvements when compared to the baseline models, where patients are not stratified. These promising results highlight the need for patient stratification for prognostic prediction in ALS. Furthermore, this innovative approach for prognostic prediction, where clinical profiles and patient profiles are integrated to enhance patient stratification, can be used to improve predictions for other disease outcomes in ALS or applied to other diseases.

Sofia Pires, Marta Gromicho, Susana Pinto, Mamede de Carvalho, Sara C. Madeira
Micro-Variations from RNA-seq Experiments for Non-model Organisms

RNA-based high-throughput sequencing technologies provide a huge amount of reads from transcripts. In addition to expression analyses, transcriptome reconstruction, or isoform detection, they could be useful for detection of gene variations, in particular micro-variations (single nucleotide polymorphisms [SNPs] and indels). Gene variations are usually based on homogenous (one single individual) DNA-seq data, but this study aims the usage of heterogeneous (several individuals) RNA-seq data to obtain clues about gene variability of a population. Therefore, new algorithms or workflows are required to fill this gap, usually disregarded. Here it is presented an automated workflow based on existing software to predict micro-variations from RNA-seq data using a genome or a transcriptome as reference. It can deal with organism whose genome sequence is known and well-annotated, as well as non-model organism where only draft genomes or transcriptomes are available. Mapping is based on STAR in both cases. Micro-variation detection relies on GATK (combining Mutect2 and HaplotypeCaller) and VarScan since they are able to provide reliable results from RNA-seq reads. The workflow has been tested with reads from normal and diseased lung from patients having small-cell lung carcinoma. Human genome, as well as human transcriptome, were used as reference and then compared: from the initial 120 000 micro-variations, only 267 were predicted by at least two algorithm in the exome of patients. The workflow was tested in non-model organisms such as Senegalese sole, using its transcriptome as reference, to determine micro-variations in sole larvae exposed to different salinities. Therefore, the workflow seems to produce robust and reliable micro-variations in coding genes based on RNA-seq, irrespective of the nature of the reference sequence. We think that this paves the way to correlate micro-variations and differentially expressed genes in non-model organisms with the aim of foster breeding plans.

Elena Espinosa, Macarena Arroyo, Rafael Larrosa, Manuel Manchado, M. Gonzalo Claros, Rocí­o Bautista
Network-Based Variable Selection for Survival Outcomes in Oncological Data

The accessibility to “big data” sets down an ambitious challenge in the medical field, especially in personalized medicine, where gene expression data are increasingly being used to establish a diagnosis and optimize treatment of oncological patients. However, the high-dimensionality nature of the data brings many constraints, for which several approaches have been considered, with regularization techniques in the cutting-edge research front. Additionally, the network structure of gene expression data has fostered the development of network-based regularization techniques to convey data into a low-dimensional and interpretable level. In this work, classical elastic net and two recently proposed network-based methods, HubCox and OrphanCox, are applied to high-dimensional gene expression data, to model survival data. An oncological transcriptomic dataset obtained from The Cancer Genome Atlas (TCGA) is used, with patients’ RNA-seq measurements as covariates. The application of sparsity-inducing techniques to the dataset enabled the selection of relevant genes over a range of parameters evaluated. Comparable results were obtained for the elastic net and the network-based OrphanCox regarding model performance and genes selected.

Eunice Carrasquinha, André Veríssimo, Marta B. Lopes, Susana Vinga
Evaluating Basic Next-Generation Sequencing Parameters in Relation to True/False Positivity Findings of Rare Variants in an Isolated Population from the Czech Republic South-Eastern Moravia Region with a High Incidence of Parkinsonism

Next-generation sequencing in 16 genes known to be associated with parkinsonism, including coding DNA, intron/exon boundaries, and UTRs loci, was used to find rare variants in 30 patients and 12 healthy controls from an isolated population of South-Eastern Moravia in the Czech Republic where epidemiological data has proved a significantly increased prevalence of parkinsonism (2.9%). The aim of the study is to evaluate the true/false positivity ratio in relation to the basic sequencing parameters (coverage, type of mutation – SNV/INDEL, percentage of rare variants in heterozygosity, ± strand bias, and length of homopolymers). The final filtered rare variants were obtained from the Ion Torrent platform with the following workflow: Torrent Suite Base calling and BAM mapping, Ion Reporter Variant calling, and rare variant filtering. True positivity findings were distinguished from false by Sanger confirmation sequencing. In total, 36 rare variants (MAF ˂ 1%) were found, of which 50% were confirmed as true positive. For SNV, the probability of false positivity is 12%; for INDEL, the false positivity proportion is 84%. A high correlation in strand biases of reference and rare variants in heterozygous findings could be a very strong indicator for true positive variants.

Radek Vodicka, Kristyna Kolarikova, Radek Vrtel, Katerina Mensikova, Petr Kanovsky, Martin Prochazka
Detection of Highly Variable Genome Fragments in Unmapped Reads of Escherichia coli Genomes

Whole-genome sequencing becomes a powerful tool in the study of closely related bacterial population as a single nucleotides changes can be detected. However, the postprocessing of obtained data still remains problematic. Reference-based assembly of genomes only allows identification of shared parts of genomes. Here, we show a pipeline for de novo assembly of unmapped reads and locating variable regions in it. Identified regions can be used as new markers for bacterial genotyping.

Marketa Nykrynova, Vojtech Barton, Matej Bezdicek, Martina Lengerova, Helena Skutkova

Machine Learning in Bioinformatics

Frontmatter
Clustering Reveals Common Check-Point and Growth Factor Receptor Genes Expressed in Six Different Cancer Types

Cancer diagnosis and prognosis has been significantly impacted by understandings of gene expression data analysis. Several groups have utilized supervised and unsupervised machine learning tools for classification and predictions on gene expression data sets. Clustering, principal component analysis, regression are some important and promising tools for analyzing gene expression data. The complex and multi-dimensions of this data with limited samples makes it challenging to understand common patterns. Several features of high dimensional data contributing to a cluster generated by a finite mixture of underlying probability distributions can be implemented with a model-based clustering method. While some groups have shown that projective clustering and ensemble techniques can be effective to combat these challenges, we have employed clustering on 6 different cancer types to address the problem of multi-dimensionality and extracting common gene expression patterns. Our analysis has provided an expression pattern of 42 genes common throughout all cancer types with most of the genes involved in important check-point and growth factor receptor functions associated with cancer pathophysiology.

Shrikant Pawar, Aditya Stanam, Chandrajit Lahiri
Predicting Infectious Diseases by Using Machine Learning Classifiers

The change and evolution of certain health variables can be an evidence that makes easier the diagnosis of infectious diseases. In this kind of diseases, it is important to monitor some patients’ variables along a particular period. It is possible to build a prediction model from registers previously stored with this information. This model can give the probability to develop the disease from input data. Machine learning algorithms can generate these prediction models, which can classify samples composed of clinical parameters in order to predict if an infectious disease will be developed. The prediction models are trained from the patients’ registers previously collected and stored along the time. This work shows an experience of applying machine learning techniques for classifying samples of different infectious diseases. Besides, we have studied the influence on the classification of the different clinical parameters, which could be very useful for the medical staff in order to monitor carefully certain parameters.

Juan A. Gómez-Pulido, José M. Romero-Muelas, José M. Gómez-Pulido, José L. Castillo Sequera, José Sanz Moreno, María-Luz Polo-Luque, Alberto Garcés-Jiménez
Bayesian Optimization Improves Tissue-Specific Prediction of Active Regulatory Regions with Deep Neural Networks

The annotation and characterization of tissue-specific cis-regulatory elements (CREs) in non-coding DNA represents an open challenge in computational genomics. Several prior works show that machine learning methods, using epigenetic or spectral features directly extracted from DNA sequences, can predict active promoters and enhancers in specific tissues or cell lines. In particular, very recently deep-learning techniques obtained state-of-the-art results in this challenging computational task. In this study, we provide additional evidence that Feed Forward Neural Networks (FFNN) trained on epigenetic data and one-dimensional convolutional neural networks (CNN) trained on DNA sequence data can successfully predict active regulatory regions in different cell lines. We show that model selection by means of Bayesian optimization applied to both FFNN and CNN models can significantly improve deep neural network performance, by automatically finding models that best fit the data. Further, we show that techniques applied to balance active and non-active regulatory regions in the human genome in training and test data may lead to over-optimistic or poor predictions. We recommend to use actual imbalanced data that was not used to train the models for evaluating their generalization performance.

Luca Cappelletti, Alessandro Petrini, Jessica Gliozzo, Elena Casiraghi, Max Schubach, Martin Kircher, Giorgio Valentini
DiS-TSS: An Annotation Agnostic Algorithm for TSS Identification

The spread, distribution and utilization of transcription start sites (TSS) experimental evidence within promoters are poorly understood. Cap Analysis of Gene Expression (CAGE) has emerged as a popular gene expression profiling protocol, able to quantitate TSS usage by recognizing the 5′ end of capped RNA molecules. However, there is an increasing volume of studies in the literature suggesting that CAGE can also detect 5′ capping events which are transcription byproducts. These findings highlight the need for computational methods that can effectively remove the excessive amount of noise from CAGE samples, leading to accurate TSS annotation and promoter usage quantification. In this study, we present an annotation agnostic computational framework, DIANA Signal-TSS (DiS-TSS), that for the first time utilizes digital signal processing inspired features customized on the peculiarities of CAGE data. Features from the spatial and frequency domains are combined with a robustly trained Support Vector Machines (SVM) model to accurately distinguish between peaks related to real transcription initiation events and biological or protocol-induced noise. When benchmarked on experimentally derived data on active transcription marks as well as annotated TSSs, DiS-TSS was found to outperform existing implementations, by providing on average ~11k positive predictions and an increase in performance by ~5% based on in the experimental and annotation-based evaluations.

Dimitris Grigoriadis, Nikos Perdikopanis, Georgios K. Georgakilas, Artemis Hatzigeorgiou
LM-Based Word Embeddings Improve Biomedical Named Entity Recognition: A Detailed Analysis

Recent studies have shown that contextualized word embeddings outperform other types of embeddings on a variety of tasks. However, there is little research done to evaluate their effectiveness in the biomedical domain under multi-task settings.We derive the contextualized word embeddings from the Flair framework and apply them to the task of biomedical NER on 5 benchmark datasets, yielding major improvements over the baseline and achieving competitive results over the current best systems. We analyze the sources of these improvements, reporting model performances over different combinations of word embeddings, and fine-tuning and casing modes.

Liliya Akhtyamova, John Cardiff
Evaluating Mutual Information and Chi-Square Metrics in Text Features Selection Process: A Study Case Applied to the Text Classification in PubMed

The aim of this work was to compare the behavior of mutual information and Chi-square as metrics in the evaluation of the relevance of the terms extracted from documents related to “software design” retrieved from PubMed database tested in two contexts: using a set of terms retrieved from the vectorization of the corpus of abstracts and using only the terms retrieved from the vocabulary defined by the IEEE standard ISO/IEC/IEEE 24765. A search was conducted concerning the subject “software” in the last 6 years and we used Medical Subject Headings (Mesh) term “software design” of the articles to label them. Then mutual information and Chi-square metrics were computed as metrics to sort and select features. Chi-square obtained the highest accuracy scores in documents classification by using a multinomial naive Bayes classifier. Although these results suggest that Chi-square is better than mutual information in feature relevance estimation in the context of this work, further research is necessary to obtain a consistent foundation of this conclusion.

José Párraga-Valle, Rodolfo García-Bermúdez, Fernando Rojas, Christian Torres-Morán, Alfredo Simón-Cuevas
Profiling Environmental Conditions from DNA

DNA is quintessential to carry out basic functions by organisms as it encodes information necessary for metabolomics and proteomics, among others. In particular, it is common nowadays to use DNA for profiling living organisms based on their phenotypic traits. These traits are the outcomes of the genetic makeup constrained by the interaction between living organisms and their surrounding environment over time. For environmental conditions, however, the conventional assumption is that they are too random and ephemeral to be encoded in the DNA of an organism. Here, we demonstrate that, to the contrary, genomic DNA may also encode sufficient information about some environmental features of an organism’s habitat for a machine learning model to reveal them, although there seem to be exceptions, i.e. some environmental features do not appear to be coded in DNA, unless our methods miss that information. Nevertheless, we demonstrate that these features can be used to train better models for better predictions of other environmental factors. These results lead directly to the question of whether over evolutionary history, DNA itself is actually also a repository of information related to the environment where the lineage has developed, perhaps even more cryptically than the way it encodes phenotypic information.

Sambriddhi Mainali, Max H. Garzon, Fredy A. Colorado
Stability of Feature Selection Methods: A Study of Metrics Across Different Gene Expression Datasets

Analysis of gene-expression data often requires that a gene (feature) subset is selected and many feature selection (FS) methods have been devised. However, FS methods often generate different lists of features for the same dataset and users then have to choose which list to use. One approach to support this choice is to apply stability metrics on the generated lists and selecting lists on that base. The aim of this study is to investigate the behavior of stability metrics applied to feature subsets generated by FS methods. The experiments in this work explore a plethora of gene expression datasets, FS methods, and expected number of features to compare several stability metrics. The stability metrics have been used to compare five feature selection methods (SVM, SAM, ReliefF, RFE + RF and LIMMA) on gene expression datasets from the EBI repository. Results show that the studied stability metrics display a high amount of variability. The reason behind this is not clear yet and is being further investigated. The final objective of the research, that is to define how to select a FS method, is an ongoing work whose partial findings are reported herein.

Zahra Mungloo-Dilmohamud, Yasmina Jaufeerally-Fakim, Carlos Peña-Reyes

Medical Image Processing

Frontmatter
Influence of Sarcopenia on Bone Health Parameters in a Group of Elderly Lebanese Men

Sarcopenia is a disease characterized by the loss of muscle mass and strength. The aim of the current study was to explore the influence of sarcopenia on bone health parameters in a group of elderly Lebanese men. To do so, we compared bone health parameters (Bone Mineral Content (BMC), Bone Mineral Density (BMD) and femoral neck geometry indices) in a group of elderly men with sarcopenia and a group of elderly men with normal Skeletal Muscle mass Index (SMI). 23 sarcopenic men (SMI < 7 kg/m2) and 23 men with normal SMI (>7 kg/m2) participated in our study. Body composition and bone variables were measured by Dual-energy X-ray Absorptiometry (DXA). DXA measurements were completed for the Whole Body (WB), Lumbar spine (L1–L4), Total Hip (TH) and Femoral Neck (FN). Hip geometry parameters including Cross-Sectional Area (CSA), Cross-Sectional Moment of Inertia (CSMI), section modulus (Z), Strength Index (SI) and Buckling Ratio (BR) were derived by DXA. Age and height were not significantly different between the two groups. Weight, Body Mass Index (BMI), lean mass, fat mass, appendicular lean mass, SMI, WB BMC, TH BMD, FN BMD, CSA, CSMI and Z were significantly higher in non-sarcopenic men compared to sarcopenic men. In the whole population, lean mass was the strongest determinant of bone health parameters. After adjusting for lean mass, there were no significant differences regarding bone health parameters between the two groups. In conclusion, the present study suggests that sarcopenia negatively influences bone health parameters in elderly Lebanese men.

Amal Antoun, Hayman Saddick, Antonio Pinti, Riad Nasr, Eric Watelain, Eric Lespessailles, Hechmi Toumi, Rawad El Hage
Novel Thermal Image Classification Based on Techniques Derived from Mathematical Morphology: Case of Breast Cancer

Image processing (IP) is a method of converting an image to digital form by performing operations on it to obtain an improved image or to extract useful information from it. It is a type of signal distribution in which the input is an image such as a video image or a photograph, and the output can be an image or associated characteristics. Besides, this system includes the processing of images in the form of two-dimensional signals, while applying signal processing methods already defined for them. One of the essential steps in IP is a combined application of erosion and dilation procedures, which is part of the mathematical morphology. This article presents a novel thermal image classification based on techniques derived from mathematical morphology. In the processing of grayscale breast cancer images, this method reveals the region of interest as the whitest area.

Sebastien Mambou, Ondrej Krejcar, Ali Selamat, Michal Dobrovolny, Petra Maresova, Kamil Kuca
Data Preprocessing via Multi-sequences MRI Mixture to Improve Brain Tumor Segmentation

Automatic brain tumor segmentation is one of the crucial problems nowadays among other directions and domains where daily clinical workflow requires to put a lot of efforts while studying computer tomography (CT) or structural magnetic resonance imaging (MRI) scans of patients with various pathologies. The MRI is the most common method of primary detection, non-invasive diagnostics and a source of recommendations for further treatment. The brain is a complex structure, different areas of which have different functional significance.In this paper, we propose a robust pre-processing technique which allows to consider all available information from MRI scans by composition of T1, T1C and FLAIR sequences in the unique input. Such approach enriches the input data for the automatic segmentation process and helps to improve the accuracy of the segmentation performance.Proposed method demonstrates significant improvement on the binary segmentation problem with respect to Dice and Recall metrics compare to similar training/evaluation procedure based on any single sequence regardless of the chosen neural network architecture.Obtained results demonstrates significant evaluation improvement while combining three MRI sequences either as weighted mixture to get 1-channel mixed up image or in the 3-channel RGB like image for both considered problems - binary brain tumor segmentation with and without inclusion of edema in the region of interest (ROI). Final improvements on the test part of data set are in the range of 5.6–9.1% on the single-fold trained model according to the Dice metric with the best value of 0.902 without considering a priori “empty” slides. We also demonstrate strong impact on the Recall metric with the growth up to 9.5%. Additionally this approach demonstrates significant improvement according to the Recall metric getting the increase by up to 11%.

Vladimir Groza, Bair Tuchinov, Evgeniy Pavlovskiy, Evgeniya Amelina, Mihail Amelin, Sergey Golushko, Andrey Letyagin
Brain MRI Modality Understanding: A Guide for Image Processing and Segmentation

Medical image processing is a highly challenging research area, thus medical imaging techniques are used to make diagnosis in human body. Moreover, as tumor in the brain is a critical and medical complaint, segmentation of the images has an important role to make segmentation of the brain tumor and it provides suspicious region diagnosis from the medical images. By the help of MRI scanners, signals generated by the human body tissues could be detected and determined spatially. Thus, we in this paper try to propose basics of MRI image modalities as a guide for understanding the processes and methods. Since original brain image is not appropriate for the examination, segmentation of the images could be very useful method for partition of the digital image into similar regions. This research also presents a guide for understanding the brain MRI sequences in other words modalities.

Ayca Kirimtat, Ondrej Krejcar, Ali Selamat
Computer-Aided Breast Cancer Diagnosis from Thermal Images Using Transfer Learning

Breast cancer is one of the prevalent types of cancer. Early diagnosis and treatment of breast cancer have vital importance for patients. Various imaging techniques are used in the detection of cancer. Thermal images are obtained by using the temperature difference of regions without giving radiation by the thermal camera. In this study, we present methods for computer aided diagnosis of breast cancer using thermal images. To this end, various Convolutional Neural Networks (CNNs) have been designed by using transfer learning methodology. The performance of the designed nets was evaluated on a benchmarking dataset considering accuracy, precision, recall, F1 measure, and Matthews Correlation coefficient. The results show that an architecture holding pre-trained convolutional layers and training newly added fully connected layers achieves a better performance compared with others. We have obtained an accuracy of 94.3%, a precision of 94.7% and a recall of 93.3% using transfer learning methodology with CNN.

Çağrı Cabıoğlu, Hasan Oğul
Blood Cell Types Classification Using CNN

White Blood Cells also known as leukocytes plays an important role in the human body by increasing the immunity by fighting against infectious diseases. The classification of White Blood Cells, plays an important role in detection of a disease in an individual. The classification can also assist with the identification of diseases like infections, allergies, anemia, leukemia, cancer, Acquired Immune Deficiency Syndrome (AIDS), etc. that are caused due to anomalies in the immune system. This classification will assist the hematologist distinguish the type of White Blood Cells present in human body and find the root cause of diseases. Currently there are a large amount of research going on in this field. Considering a huge potential in the significance of classification of WBCs, we will be using a deep learning technique Convolution Neural Networks (CNN) which can classify the images of WBCs into its subtypes namely, Neutrophil, Eosinophil, Lymphocyte and Monocyte. In this paper, we will be reporting the results of various experiments executed on the Blood Cell Classification and Detection (BCCD) dataset using CNN.

Ishpreet Singh, Narinder Pal Singh, Harnoor Singh, Saharsh Bawankar, Alioune Ngom
Medical Image Data Upscaling with Generative Adversarial Networks

Super-resolution is one of the frequently investigated methods of image processing. The quality of the results is a constant problem in the methods used to obtain high resolution images. Interpolation-based methods have blurry output problems, while non-interpolation methods require a lot of training data and high computing power. In this paper, we present a supervised generative adversarial network system that accurately generates high resolution images from a low resolution input while maintaining pathological invariance. The proposed solution is optimized for small sets of input data. Compared to existing models, our network also provides faster learning. Another advantage of our approach is its versatility for various types of medical imaging methods. We used peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) as the output image quality evaluation method. The results of our test show an improvement of 5.76% compared to optimizer Adam used in the original paper [10]. For faster training of the neural network model, calculations on the graphic card with the CUDA architecture were used.

Michal Dobrovolny, Karel Mls, Ondrej Krejcar, Sebastien Mambou, Ali Selamat
Enhancing Breast Cancer Classification via Information and Multi-model Integration

The integration of different sources of information for proper classification is of utter importance, specially in the biomedical field. Many different sources of information can be collected from a patient and they all may contribute to an accurate diagnosis. For example in cancer disease these can include gene expression (RNA-Seq) or Tissue Slide Imaging, however, their integration in order to correctly train a classification model is not straightforward. Making use of Whole-Slide-Images, this work presents a novel information integration model when different sources of data from a patient are available, named as Multi-source integration model (MSIM). Using two different Convolutional Neural Networks architectures and a Feed Forward Neural Network, the potential of a multi-model integration process which combines the information of different sources is introduced and its results are presented for Breast Cancer classification.

J. C. Morales, Francisco Carrillo-Perez, Daniel Castillo-Secilla, Ignacio Rojas, Luis Javier Herrera

Simulation and Visualization of Biological Systems

Frontmatter
Kernel Based Approaches to Identify Hidden Connections in Gene Networks Using NetAnalyzer

The latest advances in biotechnology are increasing the size and number of biological databases, specially those related to “omics” sciences. This data can be used to generate complex interaction networks, which analysis allows to extract biological information. Network analysis comprises a current bioinformatics challenge and the implementation of kernels offers a potential procedure to perform this analysis. Kernel algebraic functions have been used to study interaction networks and they are of major interest in new applications to improve machine learning studies. To manage these interaction networks, the NetAnalyzer tool was developed with the purpose of analysing multi-layer networks, calculating different probabilistic indices to establish the association between pairs of nodes. In this study we implement different kernel operations using several programming languages to inspect their reliability to perform these operations in different scenarios. Best performances have been included as a kernel functional module into NetAnalyzer, and we used them over gene interactions networks and gene-disease knowledge to identify disease causing genes.

Fernando Moreno Jabato, Elena Rojano, James R. Perkins, Juan Antonio García Ranea, Pedro Seoane-Zonjic
Comprehensive Analysis of Patients with Undiagnosed Genetic Diseases Using the Patient Exploration Tools Suite (PETS)

Systemic approaches based on network analysis have been successful in associating pathological phenotypes observed in patients with their affected genomic regions. Previously we have used phenotype-genotype associations to determine the genetic causes that lead to pathological phenotypes observed in patients with rare and complex disorders. However, these studies were limited as many of these associations had low specificity, frequently associating pathological phenotypes such as intellectual disability or growth abnormality with multiple regions of the genome. To help solve this problem, we propose that the phenotypic characterisation of patients using more specific terms will substantially improve the determination of the genetic causes that produce them. In this work we present the Patient Exploration Tools Suite (PETS), which includes three tools to: (1) determine the quality of information within a patient cohort; (2) associate genomic regions with their pathological phenotypes based on the cohort data; and (3) predict the possible genetic variants that cause the clinically observed pathological phenotypes using phenotype-genotype association values. This tool has been developed to be used by the clinical community, to facilitate patient characterisation, help identify where data quality can be improved within a cohort and help diagnose patients with complex disease.

Elena Rojano, Pedro Seoane-Zonjic, Fernando M. Jabato, James R. Perkins, Juan A. G. Ranea
Method of Detecting Orientation of Red Blood Cells Based on Video Data

This article propose a methodology to estimate the orientation of red blood cells flowing in laboratory microfluidic devices. Inputs for this methodology is a video output from the laboratory experiment, with an assumption that cells have a moderate deformation in the microfluidic device. This methodology is based on a hypothesis, that we can identify the position and inclination of the cell if we know the dimensions of its 2D projection. We applied the methodology to cells from numerical simulations. We compared the exact values of extremal cell point coordinates, with the values obtained only with a restricted knowledge about the bounding box of the 2D projection of the cell. We identified the accuracy of estimating the information about the 3D position of the cell from the 2D projection data. We found a good agreement mainly for estimation of the 3rd dimension of the cell’s bounding box, when we know only the two dimensions of the bounding box of the 2D projection of the cell.

Kristina Kovalcikova, Michal Duracik
Automated Tracking of Red Blood Cells in Images

Computer simulations of processes inside microfluidic devices often require validation data from experiments with biological cells. Besides single cell experiments, data involving flow of many cells can be used to validate many-cell behaviour. Manual data gathering from such experiments, such as video sequence of cell flow, is inefficient and needs to be automated. Building on top of an automated detection of red blood cells, tracking of red blood cells is needed in order to provide physical data about each cell. In this work, we first describe our existing traditional algorithms for cell tracking. We assess the possible metrics for measuring their performance and iterate upon the flaws of our algorithm in order to design improvements and propose a neural network solution.

František Kajánek, Ivan Cimrák, Peter Tarábek
A Novel Prediction Model for Discovering Beneficial Effects of Natural Compounds in Drug Repurposing

Natural compounds are promising leads in drug discovery due to their low toxicity and synergistic effects existing in nature, providing efficient and low-cost therapeutic solutions. Synergistic effects are observed in highly similar or closely related compounds where the combined effect is much more significant than individual usage. However, multiple hurdles exist in the identification of similar compounds, in particular, accumulation of large volumes of compounds, procurement of authentic information, diversity and complexity of the compounds, convoluted mechanism of action, need of high-throughput screening and validation techniques, most importantly incompleteness of critical information like indications for the natural compounds. Currently, not many comprehensive computational pipelines are available for drug discovery using natural products. To overcome these challenges, in this study, we focus on predicting highly similar candidate compounds with synergistic effects useful in combinatorial/alternative therapies. We developed a molecular compound similarity prediction model for computing four different compound-compound similarity scores based on (i) bioactivity, (ii) chemical structure, (iii) target enzyme, and (iv) protein functional domain, using the data from public repositories. The calculated scores are combined efficiently for predicting highly similar compound pairs with similar biological or physicochemical properties. We evaluate the accuracy of our model with pharmacological and bioassay results, and manually curated literature from PubChem, NCBI, etc. As a use case, we selected 415 compounds based on 13 functional categories, out of which 66 natural compounds with 198 compound-compound similarity scores were identified as top candidates based on similar bioactivities, chemical substructures, targets, and protein functional sites. Statistical analysis of the scores revealed a significant difference in the mean similarity scores for all four categories. Twenty-eight closely interacting compounds, including Quercetin, Apigenin, etc. were identified as candidates for combinational therapies showing synergistic effects. Herbs, including Dill, Basil, Garlic, Mint, etc., were predicted as potential combinations for achieving synergistic effects. Twenty-four compounds with unknown pharmacological effects were associated with 58 potential new pharmacological effects/indications. If applied broadly, this model can address many problems in chemogenomics and help in identifying novel drug targets and indications, which is a critical step in natural drug discovery research and evidence for drug-repurposing.

Suganya Chandrababu, Dhundy Bastola
Assisted Generation of Bone Fracture Patterns

We present a method for the generation of bone fracture patterns assisted by computer. A tool has been designed for the generation of a fracture pattern interactively and guided by the system, based on the study of real cases of fractures. This tool assists the specialist in obtaining fracture patterns according to certain rules taken from the statistical analysis of real cases. This tool can be used for the generation of biological databases of fractures. Another use of the fracture pattern could be the generation of virtual fractures, applying the fracture pattern to 3D osseous models.

Gema Parra-Cabrera, Francisco Daniel Pérez-Cano, Adrián Luque-Luque, Juan José Jiménez-Delgado
Backmatter
Metadata
Title
Bioinformatics and Biomedical Engineering
Editors
Prof. Ignacio Rojas
Prof. Olga Valenzuela
Prof. Fernando Rojas
Luis Javier Herrera
Dr. Francisco Ortuño
Copyright Year
2020
Electronic ISBN
978-3-030-45385-5
Print ISBN
978-3-030-45384-8
DOI
https://doi.org/10.1007/978-3-030-45385-5

Premium Partner