Skip to main content

2020 | Buch

Computational Intelligence Methods for Bioinformatics and Biostatistics

15th International Meeting, CIBB 2018, Caparica, Portugal, September 6–8, 2018, Revised Selected Papers

herausgegeben von: Prof. Maria Raposo, Paulo Ribeiro, Susana Sério, Prof. Antonino Staiano, Prof. Angelo Ciaramella

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the thoroughly refereed post-conference proceedings of the 15th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics., CIBB 2018, held in Caparica, Portugal, in September 2018. The 32 revised full papers were carefully reviewed and selected from 51 submissions. The papers present current trends at the edge of computer and life sciences, the application of computational intelligence to a system and synthetic biology and the consequent impact on innovative medicine were presented. Theoretical and experimental biologists also presented novel challenges and fostered multidisciplinary collaboration aiming to blend theory and practice, where the founding theories of the techniques used for modelling and analyzing biological systems are investigated and used for practical applications and the supporting technologies.

Inhaltsverzeichnis

Frontmatter
Correction to: Computational Intelligence Methods for Bioinformatics and Biostatistics

In the original version of the book, the affiliations of Antonino Staiano and Angelo Ciaramella were wrong. Both affiliations have been corrected to: Università degli Studi di Napoli Parthenope.

Maria Raposo, Paulo Ribeiro, Susana Sério, Antonino Staiano, Angelo Ciaramella

Computational Intelligence Methods for Bioinformatics and Biostatistics

Frontmatter
Compressive Sensing and Hierarchical Clustering for Microarray Data with Missing Values

Commonly, in gene expression microarray measurements multiple missing expression values are generated, and the proper handling of missing values is a critical task. To address the issue, in this paper a novel methodology, based on compressive sensing mechanism, is proposed in order to analyze gene expression data on the basis of topological characteristics of gene expression time series. The approach conceives, when data are recovered, their processing through a non-linear PCA for dimensional reduction and a Hierarchical Clustering Algorithm for agglomeration and visualization. Experiments have been performed on the yeast Saccharomyces cerevisiae dataset by considering different percentages of information loss. The approach highlights robust performance when high percentage of loss of information occurs and when few sampling data are available.

Angelo Ciaramella, Davide Nardone, Antonino Staiano
Variational Inference in Probabilistic Single-cell RNA-seq Models

Single-cell sequencing technology holds the promise of unravelling cell heterogeneities hidden in ubiquitous bulk-level analyses. However, limitations of current experimental methods also pose new obstacles that prevent accurate conclusions from being drawn. To overcome this, researchers have developed computational methods which aim at extracting the biological signal of interest from the noisy observations. In this paper we focus on probabilistic models designed for this task. Particularly, we describe how variational inference constitutes a powerful inference mechanism for different sample sizes, and critically review two recent scRNA-seq models which use it.

Pedro F. Ferreira, Alexandra M. Carvalho, Susana Vinga
Centrality Speeds the Subgraph Isomorphism Search Up in Target Aware Contexts

The subgraph isomorphism (SubGI) problem is known to be a NP-Complete problem. Several methodologies use heuristic approaches to solve it, differing into the strategy to search the occurrences of a graph into another. This choice strongly influences their computational effort requirement. We investigate seven search strategies where global and local topological properties of the graphs are exploited by means of weighted graph centrality measures. Results on benchmarks of biological networks show the competitiveness of the proposed seven alternatives and that, among them, local strategies predominate on sparse target graphs, and closeness- and eigenvector-based strategies outperform on dense graphs.

Vincenzo Bonnici, Simone Caligola, Antonino Aparo, Rosalba Giugno
Structure-Based Antibody Paratope Prediction with 3D Zernike Descriptors and SVM

Antibodies currently represent the most valuable category of biopharmaceuticals for both diagnostic and therapeutic applications. They are a class of Y-shaped proteins capable of specifically recognizing and binding to a virtually infinite number of antigens. Being able to identify the antigen-binding residues in an antibody structure is crucial for all antibody design methods and for shedding light on the complex mechanisms that govern antigen recognition and binding. This paper presents a method for antibody interface prediction from their experimentally-solved structures based on 3D Zernike Descriptors. Roto-translationally invariant descriptors are computed from circular patches of the antibody surface enriched with a chosen subset of physicochemical properties from the AAindex1 amino acid index set, and are used as samples for a binary classification problem. An SVM classifier is used to distinguish interface surface patches from non-interface ones. The proposed method outperforms other antigen-binding interface prediction software, namely Paratome, Antibody i-Patch and Parapred, on a novel dataset of experimentally-solved antibody–antigen complex structures.

Sebastian Daberdaku
Simultaneous Phasing of Multiple Polyploids

We address the problem of phasing polyploids specifically with polyploidy larger than two. We consider the scenario where the input is the genotype of samples along a genic chromosomal segment. In this setting, instead of NGS reads of the segments of a sample, genotype data from multiple individuals is available for simultaneous phasing. For this mathematically interesting problem, with application in plant genomics, we design and test two algorithms under a parsimony model. The first is a linear time greedy algorithm and the second is a more carefully crafted algebraic algorithm. We show that both the methods work reasonably well (with accuracy on an average larger than 80%). The former is very time-efficient and the latter improves the accuracy further.

Laxmi Parida, Filippo Utro
Classification of Epileptic Activity Through Temporal and Spatial Characterization of Intracranial Recordings

Focal epilepsy is a chronic condition characterized by hyper-activity and abnormal synchronization of a specific brain region. For pharmacoresistant patients, the surgical resection of the critical area is considered a valid clinical solution, therefore, an accurate localization is crucial to minimize neurological damage. In current clinical routine the characterization of the Epileptogenic Zone (EZ) is performed using invasive methods, such as Stereo-ElectroEncephaloGraphy (SEEG). Medical experts perform the tag of neural electrophysiological recordings by visually inspecting the acquired data, a highly time consuming and subjective procedure. Here we show the results of an automatic multi-modal classification method for the evaluation of critical areas in focal epileptic patients. The proposed method represents an attempt in the characterization of brain areas which integrates the anatomical information on neural tissue, inferred using Magnetic Resonance Imaging (MRI) in combination with spectral features extracted from SEEG recordings.

Vanessa D’Amario, Gabriele Arnulfo, Lino Nobili, Annalisa Barla
Committee-Based Active Learning to Select Negative Examples for Predicting Protein Functions

The Automated Functional Prediction (AFP) of proteins became a challenging problem in bioinformatics and biomedicine aiming at handling and interpreting the extremely large-sized proteomes of several eukaryotic organisms. A central issue in AFP is the absence in public repositories for protein functions, e.g. the Gene Ontology (GO), of well defined sets of negative examples to learn accurate classifiers for AFP. In this paper we investigate the Query by Committee paradigm of active learning to select the negatives most informative for the classifier and the protein function to be inferred. We validated our approach in predicting the Gene Ontology function for the S.cerevisiae proteins.

Marco Frasca, Maryam Sepehri, Alessandro Petrini, Giuliano Grossi, Giorgio Valentini
A Graphical Tool for the Exploration and Visual Analysis of Biomolecular Networks

Many interactions among bio-molecular entities, e.g. genes, proteins, metabolites, can be easily represented by means of property graphs, i.e. graphs that are annotated both on the vertices (e.g. entity identifier, Gene Ontology or Human Phenotype Ontology terms) and on the edges (the strength of the relationship, the evidence of the source from which the weight has been taken, etc.). These graphs contain a relevant information that can be exploited for conducting different kinds of analysis, such as automatic function prediction, disease gene prioritization, drug repositioning. However, the number and size of the networks are becoming quite large and there is the need of tools that allow the biologists to manage the networks, graphically explore their structures, and organize the visualization and analysis of the graph according to different perspectives. In this paper we introduce the web service that we have developed for the visual analysis of biomolecular networks. Specifically we will show the different functionalities for exploring big networks (that do not fit in the current canvas) starting from a specific vertex, for changing the view perspective of the network, and for navigating the network and thus identifying new relationships. The proposed system extends the functionalities of off-the-shelf graphical visualization tools (e.g. GraphViz and GeneMania) by limiting the production of big cloud of points and allowing further customized visualizations of the network and introducing their vertex-centric exploration.

Cheick Tidiane Ba, Elena Casiraghi, Marco Frasca, Jessica Gliozzo, Giuliano Grossi, Marco Mesiti, Marco Notaro, Paolo Perlasca, Alessandro Petrini, Matteo Re, Giorgio Valentini
Improved Predictor-Corrector Algorithm

The differential geometric least angle regression method consists essentially in computing the solution path. In Augugliaro et al. [4], this problem is satisfactorily solved by using a predictor-corrector (PC) algorithm, that however has the drawback of becoming intractable when working with thousands of predictors. Using the PC algorithm leads to an increase in the run times needed for computing the solution curve. In this paper we explain an improved version of the PC algorithm (IPC), proposed in Pazira et al. [9], to decrease the effects stemming from this problem for computing the solution curve. The IPC algorithm allows the dgLARS method to be implemented by using less number of arithmetic operations that leads to potential computational saving.

Hassan Pazira
Identification of Key miRNAs in Regulation of PPI Networks

In this paper, we explore the interaction between miRNA and deregulated proteins in some pathologies. Assuming that miRNA can influence mRNA and consequently the proteins regulation, we explore this connection by using an interaction matrix derived from miRNA-target data and PPI network interactions. From this interaction matrix and the set of deregulated proteins, we search for the miRNA subset that influences the deregulated proteins with a minimum impact on the not deregulated ones. This regulation problem can be formulated as a complex optimization problem. In this paper, we have tried to solve it by using the Genetic Algorithm Heuristic. As the main result, we have found a set of miRNA that is known to be involved in disease development.

Antonino Fiannaca, Laura La Paglia, Massimo La Rosa, Giosué Lo Bosco, Riccardo Rizzo, Alfonso Urso
Recurrent Deep Neural Networks for Nucleosome Classification

Nucleosomes are the fundamental repeating unit of chromatin. A nucleosome is an 8 histone proteins complex, in which approximately 147–150 pairs of DNA bases bind. Several biological studies have clearly stated that the regulation of cell type-specific gene activities are influenced by nucleosome positioning. Bioinformatic studies have improved those results showing proof of sequence specificity in nucleosomes’ DNA fragment. In this work, we present a recurrent neural network that uses nucleosome sequence features representation for their classification. In particular, we implement an architecture which stacks convolutional and long short-term memory layers, with the main purpose to avoid the features extraction and selection steps. We have computed classifications using eight datasets of three different organisms with a growing genome complexity, from yeast to human. We have also studied the capability of the model trained on the highest complex species in recognizing nucleosomes of the other organisms.

Domenico Amato, Mattia Antonino Di Gangi, Giosuè Lo Bosco, Riccardo Rizzo

Modeling and Simulation Methods in System Biology

Frontmatter
Searching for the Source of Difference: A Graphical Model Approach

A growing body of evidence shows that when performing differential analysis it is highly beneficial to go beyond differences in the level of individual genes, and consider differences in their interactions as well. We propose an original statistical approach that identifies the set of variables driving the difference between two conditions under study. Our proposal, set within the framework of Gaussian graphical models, is implemented in the R package SourceSet, that also extends the analysis from a single to multiple pathways and provides several graphical outputs, including Cytoscape visualization to browse the results.

Vera Djordjilović, Monica Chiogna, Chiara Romualdi, Elisa Salviato
A New Partially Segment-Wise Coupled Piece-Wise Linear Regression Model for Statistical Network Structure Inference

We propose a new non-homogeneous dynamic Bayesian network with partially segment-wise sequentially coupled network parameters. The idea is to infer the segmentation of a time series of network data using multiple changepoint processes, and to model the data in each segment by linear regression models. The conventional uncoupled models infer the network interaction parameters for each segment separately, without any systematic information-sharing among segments. More recently, it was proposed to couple the network interaction parameters sequentially among segments. The idea is to enforce the parameters of any segment to stay similar to those of the previous segment. This coupling mechanism can be disadvantageous, as it enforces coupling and does not feature any options to uncouple. We propose a new consensus model that infers for each individual segment whether it should be coupled to (or better should stay uncoupled from) the preceding one.

Mahdi Shafiee Kamalabad, Marco Grzegorczyk
Inhibition of Primed Ebola Virus Glycoprotein by Peptide Compound Conjugated to HIV-1 Tat Peptide Through a Virtual Screening Approach

A higher prevalence of Ebola hemorrhagic fever is caused by Ebola virus (EBOV). It enters into the host cell through macropinocytosis mechanism. During the entry process, the primed viral glycoprotein (GPcl) interacts with a lysosomal cholesterol transporter, Niemann Pick C1 (NPC1), leading to the fusion of the viral envelope and the host endosomal membrane. Hence, disrupting the interaction between EBOV GPcl and host NPC1 is a promising way to prevent the viral nucleocapsid content entering the cytoplasm. In this study, a virtual screening approach has been used to investigate peptide compounds conjugated to HIV-1 Tat peptide as drug lead candidate inhibiting EBOV GPcl. About 50,261 peptides from NCBI PubChem database, which acts as ligands, were subjected to initial toxicological screening to omit ligands with undesired properties. The remaining ligands underwent a pharmacophore search, rigid docking, and flexible docking simulation to discover ligands with favorable inhibition activities. Calfluxin, SNF 8906, grgesy, phosphoramidon, and endothelin (16-21) were five ligands which have lower ΔGbinding value compared to the standard ligand. The chosen ligands were subjected to absorption, distribution, metabolism, excretion, and toxicity (ADME-Tox) analysis, which was accomplished by pkCSM software. Subsequently, they were conjugated to HIV-1 Tat peptide to accumulate them inside the endosome. The inhibition activity was reevaluated by the second flexible molecular docking simulation. As a result, only C-Calfluxin showed improved affinity while managing minimal conformational changes in protein-peptide interaction compared to its respective unconjugated ligand.

Ahmad Husein Alkaff, Mutiara Saragih, Mochammad Arfin Fardiansyah Nasution, Usman Sumo Friend Tambunan
Pharmacophore Modelling, Virtual Screening, and Molecular Docking Simulations of Natural Product Compounds as Potential Inhibitors of Ebola Virus Nucleoprotein

Ebola virus (EBOV) prevails as a serious public health issue which infected at least 27,000 people and claimed the lives of about 11,000 people in the latest Ebola outbreak in 2014. Although the virus has been known for almost 40 years, currently there is no approved drug for this virus. Hence, the development of a new drug candidate for Ebola is required to anticipate the future outbreak that may happen. In this research, about 229,538 natural product (NP) compounds were retrieved and screened using a computational approach against EBOV nucleoprotein (NP). In the beginning, all NP compounds were screened throughout computational toxicity and druglikeness prediction tests, followed by pharmacophore-based virtual screening and molecular docking simulation to identify their binding affinity and molecular interaction in the RNA-binding groove of EBOV NP. All of the results were compared to 18β-glycyrrhetinic acid, the standard molecule of EBOV NP. In the end, about five NP compounds (UNPD213871, UNPD199951, UNPD124962, UNPD139843, and UNPD147202) were identified to have exciting activities against EBOV NP. Therefore, based on the results of this study, these compounds appeared to have potential inhibition activities against EBOV NP and can be proposed for further in silico and in vitro studies.

Mochammad Arfin Fardiansyah Nasution, Ahmad Husein Alkaff, Ilmi Fadhilah Rizki, Ridla Bakri, Usman Sumo Friend Tambunan
Global Sensitivity Analysis of Constraint-Based Metabolic Models

In the latter years, detailed genome-wide metabolic models have been proposed, paving the way to thorough investigations of the connection between genotype and phenotype in human cells. Nevertheless, classic modeling and dynamic simulation approaches—based either on differential equations integration, Markov chains or hybrid methods—are still unfeasible on genome-wide models due to the lack of detailed information about kinetic parameters and initial molecular amounts. By relying on a steady-state assumption and constraints on extracellular fluxes, constraint-based modeling provides an alternative means—computationally less expensive than dynamic simulation—for the investigation of genome-wide biochemical models. Still, the predictions provided by constraint-based analysis methods (e.g., flux balance analysis) are strongly dependent on the choice of flux boundaries. To contain possible errors induced by erroneous boundary choices, a rational approach suggests to focus on the pivotal ones. In this work we propose a novel methodology for the automatic identification of the key fluxes in large-scale constraint-based models, exploiting variance-based sensitivity analysis and distributing the computation on massively multi-core architectures. We show a proof-of-concept of our approach on core models of relatively small size (up to 314 reactions and 256 chemical species), highlighting the computational challenges.

Chiara Damiani, Dario Pescini, Marco S. Nobile
Efficient and Settings-Free Calibration of Detailed Kinetic Metabolic Models with Enzyme Isoforms Characterization

Mathematical modeling and computational analyses are essential tools to understand and gain novel insights on the functioning of complex biochemical systems. In the specific case of metabolic reaction networks, which are regulated by many other intracellular processes, various challenging problems hinder the definition of compact and fully calibrated mathematical models, as well as the execution of computationally efficient analyses of their emergent dynamics. These problems especially occur when the model explicitly takes into account the presence and the effect of different isoforms of metabolic enzymes. Since the kinetic characterization of the different isoforms is most of the times unavailable, Parameter Estimation (PE) procedures are typically required to properly calibrate the model. To address these issues, in this work we combine the descriptive power of Stochastic Symmetric Nets, a parametric and compact extension of the Petri Net formalism, with FST-PSO, an efficient and settings-free meta-heuristics for global optimization that is suitable for the PE problem. To prove the effectiveness of our modeling and calibration approach, we investigate here a large-scale kinetic model of human intracellular metabolism. To efficiently execute the large number of simulations required by PE, we exploit LASSIE, a deterministic simulator that offloads the calculations onto the cores of Graphics Processing Units, thus allowing a drastic reduction of the running time. Our results attest that estimating isoform-specific kinetic parameters allows to predict how the knock-down of specific enzyme isoforms affects the dynamic behavior of the metabolic network. Moreover, we show that, thanks to LASSIE, we achieved a speed-up of $${\sim }\!30{\times }$$ with respect to the same analysis carried out on Central Processing Units.

Niccolò Totis, Andrea Tangherloni, Marco Beccuti, Paolo Cazzaniga, Marco S. Nobile, Daniela Besozzi, Marzio Pennisi, Francesco Pappalardo

Computational Models in Health Informatics and Medicine

Frontmatter
Automatic Discrimination of Auditory Stimuli Perceived by the Human Brain

Humans are able to perceive small difference of sound frequency but it is still unknown how the difference in frequency information is represented at the level of the primary sensory cortex. Indeed, analysis of fMRI imaging identified tonotopic maps through the auditory pathways to the primary sensory cortex. These maps are unfortunately too coarse to show ultra-fine discrimination. Then, the hypothesis is that this small frequency differences are recognised thanks to the information coming from a large set of auditory neurons. To investigate this possibility, a multi-voxel pattern discriminating analysis of the response of BOLD-fMRI in the bilateral auditory cortex to tonal stimuli with different shift in frequency was performed. Our results suggest that small shifts in the frequency are easily classified compared with big shifts and that multiple areas of the auditory cortex are involved in the tone recognition.

Angela Serra, Antonio della Pietra, Marcus Herdener, Roberto Tagliaferri, Fabrizio Esposito
Neural Models for Brain Networks Connectivity Analysis

Functional MRI (fMRI) attracts huge interest for the machine learning community nowadays. In this work we propose a novel data augmentation procedure through analysing the inherent noise in fMRI. We then use the novel augmented dataset for the classification of subjects by age and gender, showing a significant improvement in the accuracy performance of Recurrent Neural Networks. We test the new data augmentation procedure in the fMRI dataset belonging to one international consortium of neuroimaging data for healthy controls: the Human Connectome Projects (HCP).From the analysis of this dataset, we also show how the differences in acquisition habits and preprocessing pipelines require the development of representation learning tools. In the present paper we apply autoencoder deep learning architectures and we present their uses in resting state fMRI, using the novel data augmentation technique proposed.This research field, appears to be unexpectedly undeveloped so far, and could potentially open new important and interesting directions for future analysis.

Razvan Kusztos, Giovanna Maria Dimitri, Pietro Lió
Exposing and Characterizing Subpopulations of Distinctly Regulated Genes by K-Plane Regression

Understanding the roles and interplays of histone marks and transcription factors in the regulation of gene expression is of great interest in the development of non-invasive and personalized therapies. Computational studies at genome-wide scale represent a powerful explorative framework, allowing to draw general conclusions. However, a genome-wide approach only identifies generic regulative motifs, and possible multi-functional or co-regulative interactions may remain concealed. In this work, we hypothesize the presence of a number of distinct subpopulations of transcriptional regulative patterns within the set of protein coding genes that explain the statistical redundancy observed at a genome-wide level. We propose the application of a K-Plane Regression algorithm to partition the set of protein coding genes into clusters with specific shared regulative mechanisms. Our approach is completely data-driven and computes clusters of genes significantly better fitted by specific linear models, in contrast to single regressions. These clusters are characterized by distinct and sharper histonic input patterns, and different mean expression values.

Fabrizio Frasca, Matteo Matteucci, Marco J. Morelli, Marco Masseroli
Network Propagation-Based Semi-supervised Identification of Genes Associated with Autism Spectrum Disorder

Autism Spectrum Disorder (ASD) is an etiologically and clinically heterogeneous neurodevelopmental disorder with more than 800 putative risk genes. This heterogeneity, coupled with the low penetrance of most ASD-associated mutations presents a challenge in identifying the relevant genetic determinants of ASD. We developed a machine learning semi-supervised gene scoring and classification method based on network propagation using a variant of the random walk with restart algorithm to identify and rank genes according to their association to know ASD-related genes. The method combines information from protein-protein interactions and positive (disease-related) and negative (disease-unrelated) genes. Our results indicate that the proposed method can classify held-out known disease genes in a cross-validation setting with good performance (area under the receiver operating curve $$\sim $$0.85, area under the precision-recall curve $$\sim $$0.8 and Matthews correlation coefficient 0.57). We found a set of top-ranking novel candidate genes identified by the method to be significantly enriched for pathways related to synaptic transmission and ion transport and specific neurotransmitter-associated pathways previously shown to be associated with ASD. Most of the novel candidate genes were found to be targeted by denovo single nucleotide variants in ASD patients.

Hugo F. M. C. Martiniano, Muhammad Asif, Astrid Moura Vicente, Luís Correia
Designing and Evaluating Deep Learning Models for Cancer Detection on Gene Expression Data

Transcription profiling enables researchers to understand the activity of the genes in various experimental conditions; in human genomics, abnormal gene expression is typically correlated with clinical conditions. An important application is the detection of genes which are most involved in the development of tumors, by contrasting normal and tumor cells of the same patient. Several statistical and machine learning techniques have been applied to cancer detection; more recently, deep learning methods have been attempted, but they have typically failed in meeting the same performance as classical algorithms. In this paper, we design a set of deep learning methods that can achieve similar performance as the best machine learning methods thanks to the use of external information or of data augmentation; we demonstrate this result by comparing the performance of new methods against several baselines.

Arif Canakoglu, Luca Nanni, Artur Sokolovsky, Stefano Ceri
Analysis of Extremely Obese Individuals Using Deep Learning Stacked Autoencoders and Genome-Wide Genetic Data

Genetic predisposition has been identified as one of the components contributing to the obesity epidemic in modern societies. The aetiology of polygenic obesity is multifactorial, which indicates that lifestyle and environmental factors may influence multiples genes to aggravate this disorder. Several low-risk single nucleotide polymorphisms (SNPs) have been associated with BMI. However, identified loci only explain a small proportion of the variation observed for this phenotype. The linear nature of genome wide association studies (GWAS) used to identify associations between genetic variants and the phenotype have had limited success in explaining the heritability variation of BMI and shown low predictive capacity in classification studies. GWAS ignores the epistatic interactions that less significant variants have on the phenotypic outcome. In this paper we utilise a novel deep learning-based methodology to reduce the high dimensional space in GWAS and find epistatic interactions between SNPs for classification purposes. SNPs were filtered based on the effects associations have with BMI. Since Bonferroni adjustment for multiple testing is highly conservative, an important proportion of SNPs involved in SNP-SNP interactions are ignored. Therefore, only SNPs with p-values <1 × 10−2 were considered for subsequent epistasis analysis using stacked autoencoders (SAE). This allows the nonlinearity present in SNP-SNP interactions to be discovered through progressively smaller hidden layer units and to initialise a multi-layer feedforward artificial neural network (ANN) classifier. The classifier is fine-tuned to classify extremely obese and non-obese individuals. The performance of classifications using progressively smaller compressed layers was compared and the results reported. The best results were obtained with 2,000 compressed units (SE = 0.949153, SP = 0.933014, Gini = 0.949936, Logloss = 0.1956, AUC = 0.97497 and MSE = 0.054057). Using 50 compressed units it was possible to achieve (SE = 0.785311, SP = 0.799043, Gini = 0.703566, Logloss = 0.476864, AUC = 0.85178 and MSE = 0.156315).

Casimiro A. Curbelo Montañez, Paul Fergus, Carl Chalmers, Jade Hind
Predicting the Oncogenic Potential of Gene Fusions Using Convolutional Neural Networks

Predicting the oncogenic potential of a gene fusion transcript is an important and challenging task in the study of cancer development. To this date, the available approaches mostly rely on protein domain analysis to provide a probability score explaining the oncogenic potential of a gene fusion. In this paper, a Convolutional Neural Network model is proposed to discriminate gene fusions into oncogenic or non-oncogenic, exploiting only the protein sequence without protein domain information. Our proposed model obtained accuracy value close to 90% on a dataset of fused sequences.

Marta Lovino, Gianvito Urgese, Enrico Macii, Santa di Cataldo, Elisa Ficarra
Unravelling Breast and Prostate Common Gene Signatures by Bayesian Network Learning

Breast invasive carcinoma (BRCA) and prostate adenocarcinoma (PRAD) are two of the most common types of cancer in women and men, respectively. As hormone-dependent tumours, BRCA and PRAD share considerable underlying biological similarities worth being exploited. The disclosure of gene networks regulating both types of cancers would potentially allow the development of common therapies, greatly contributing to disease management and health economics. A methodology based on Bayesian network learning is proposed to unravel breast and prostate common gene signatures. BRCA and PRAD RNA-Seq data from The Cancer Genome Atlas (TCGA) measured over $${\sim }20000$$ genes were used. A prior dimensionality reduction step based on sparse logistic regression with elastic net penalisation was employed to select a set of relevant genes and provide more interpretable results. The Bayesian networks obtained were validated against information from STRING, a database containing known gene interactions, showing high concordance.

João Villa-Brito, Marta B. Lopes, Alexandra M. Carvalho, Susana Vinga

Engineering Bio-Interfaces and Rudimentary Cells as a Way to Develop Synthetic Biology

Frontmatter
Effect of Epigallocatechin-3-gallate on DMPC Oxidation Revealed by Infrared Spectroscopy

The daily exposure of skin cells to the sun increases the rate of production of free radicals, which threatens the healthy appearance of skin and, even more worrying, damages the structural integrity of tissues and DNA, causing inflammation and carcinogenesis. This work demonstrates the feasibility of using natural agents, in particular tea catechins, in protecting lipidic membranes from oxidative stress-induced by UV radiation exposure. For that purpose, thin cast films prepared from vesicular suspensions of dimyristoylphosphatidylcholine (DMPC) and dimyristoylphosphatidylcholine + (-)-epigallocatechin-3-gallate (DMPC + EGCG) were deposited onto calcium fluoride supports and irradiated with 254 nm UV radiation. The molecular damage after irradiation with UV light was analysed by infrared (IR) together with 2D correlation spectroscopies. Results revealed that the DMPC phospholipid polar moiety is the most vulnerable and sensitive structural target of UV radiation. To check if the presence of the EGCG molecules is protecting the lipids, the principal components analysis (PCA) mathematical method was applied, allowing to conclude that EGCG slows down the cascade of the oxidant-events in the lipid, thus protecting the polar moiety of the lipid.

Filipa Pires, Bárbara Rodrigues, Gonçalo Magalhães-Mota, Paulo A. Ribeiro, Maria Raposo
Effect of EGCG on the DNA in Presence of UV Radiation

The exposure to ultraviolet (UV) radiation is clearly a current concern since it damages the deoxyribonucleic acid (DNA) and increases the likelihood of developing skin cancer. On the other hand, green tea compounds such as (-)epigallocatechin-3-gallate (EGCG) present several biological properties and, are well-known for its antioxidant activity. The aim of this work is evaluate the effect of the UV radiation on DNA in presence of EGCG molecules. Results of the evolution of the UV-visible spectra with the UV irradiation suggest that EGCG act like an intercalant molecule and a micromolar concentration of EGCG is effective to induce a strong degradation on the DNA pyrimidines bases under UV radiation. This achievement can lead to a novel class of non-binding safe molecules capable of affinity interaction with the DNA as intercalant molecule which can be used as anti-tumor drugs.

Thais P. Pivetta, Filipa Pires, Maria Raposo
Non-thermal Atmospheric Pressure Plasmas: Generation, Sources and Applications

Non-thermal atmospheric pressure plasmas, also known as Cold Atmospheric Plasmas (CAPs) are emerging as a potential alternative for cancer treatment since they can be generated at atmospheric conditions and their low temperatures allow the interaction with living tissues without thermal damage. This article focus on the study of the interaction between plasma and a non-cancerous cell line, particularly, VERO cells. Some in-vitro experiments were performed with a custom-made device in order to better understand how CAPs affect non-cancerous cells. It was also studied the influence of several factors such as the distance from the device (gap), the duration and the type of treatment, direct or indirect, on the cell viability after exposure to CAPs treatments. The obtained results revealed the importance of the determination of the optimal relation between gap and treatment time, since small variations in each one of them can lead to different results in the cell viability.

Sara Pereira, Érica Pinto, Paulo António Ribeiro, Susana Sério
Adsorption of Triclosan on Sensors Based on PAH/PAZO Thin-Films: The Effect of pH

Triclosan (TCS) is a broad-spectrum antimicrobial, preservative agent widely used in pharmaceuticals and personal care products, considered as a troubling contaminant from the environmental point of view because of its toxicity, bacterial resistance promotion, and estrogenic effects. Under this compliance, the pernicious presence of TCS in the environment is requiring the development of molecular dedicated sensors, which in turn leads to the need to find adequate molecular systems capable of giving rise to a transduction. In this work, in order to investigate the affinity of TCS to common polyelectrolytes in an aqueous environment the adsorption of TCS on thin layer-by-layer (LbL) films of poly(1-(4-(3-carboxy-4-hydroxyphenylazo) benzene sulfonamido) -1,2ethanediyl, sodium salt) (PAZO) and poly (allylamine hydrochloride) (PAH) polyelectrolytes at different values of pH of the solution and changing the outer layer, PAZO and PAH, was investigated. Results demonstrated that the PAH layer is the most indicated to better adsorb TCS molecules. These results are of great importance for the development of TCS sensors based on LbL films, since it indicates that the outer layers of LbL films should be positive electrically charged.

Joao Pereira-da-Silva, Paulo M. Zagalo, Goncalo Magalhães-Mota, Paulo A. Ribeiro, Maria Raposo
Detection of Triclosan Dioxins After UV Irradiation – A Preliminar Study

Triclosan (TCS) by itself represents a major health and environmental problem. Also concerning are its photoproducts, various dioxins, which are even more dangerous, creating a need and opportunity to develop dedicated sensors to detect their presence in water. By treating featured data through principal component analysis (PCA), the foot-print of the dangerous TCS products after irradiation can be clearly outlined. This result allow us to conclude that a TCS sensor device based on electronic tongue concept can be envisaged.

Gonçalo Magalhães-Mota, Filipa Pires, Paulo A. Ribeiro, Maria Raposo
Backmatter
Metadaten
Titel
Computational Intelligence Methods for Bioinformatics and Biostatistics
herausgegeben von
Prof. Maria Raposo
Paulo Ribeiro
Susana Sério
Prof. Antonino Staiano
Prof. Angelo Ciaramella
Copyright-Jahr
2020
Electronic ISBN
978-3-030-34585-3
Print ISBN
978-3-030-34584-6
DOI
https://doi.org/10.1007/978-3-030-34585-3