Skip to main content

2008 | Buch

Computational Intelligence in Biomedicine and Bioinformatics

Current Trends and Applications

herausgegeben von: Tomasz G. Smolinski, Mariofanna G. Milanova, Aboul-Ella Hassanien

Verlag: Springer Berlin Heidelberg

Buchreihe : Studies in Computational Intelligence

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter

Techniques and Methodologies

Frontmatter
Computational Intelligence in Solving Bioinformatics Problems: Reviews, Perspectives, and Challenges
Summary
This chapter presents a broad overview of Computational Intelligence (CI) techniques including Artificial Neural Networks (ANN), Particle Swarm Optimization (PSO), Genetic Algorithms (GA), Fuzzy Sets (FS), and Rough Sets (RS). We review a number of applications of computational intelligence to problems in bioinformatics and computational biology, including gene expression, gene selection, cancer classification, protein function prediction, multiple sequence alignment, and DNA fragment assembly. We discuss some representative methods to provide inspiring examples to illustrate how CI could be applied to solve bioinformatic problems and how bioinformatics could be analyzed, processed, and characterized by computational intelligence. Challenges to be addressed and future directions of research are presented. An extensive bibliography is also included.
Aboul-Ella Hassanien, Mariofanna G. Milanova, Tomasz G. Smolinski, Ajith Abraham
Data Mining and Genetic Algorithms: Finding Hidden Meaning in Biological and Biomedical Data
Summary
The amount of biological and biomedical data being accumulated continues to grow at incredible rates. Having tools that can search through these enormous databases is of critical importance to the advancement of research. Data mining is a field of research in Computer Science that specializes in examining large collections of data and extracting patterns that occur within the data. One useful technique for performing data mining is through a genetic algorithm, a process that mimics evolution. This chapter highlights data mining and the genetic algorithm technique, and it also lists many applications where data mining tools have been beneficial to biological and biomedical researchers, and lists some of the available data mining tools.
Christopher M. Taylor, Arvin Agah
The Use of Rough Sets as a Data Mining Tool for Experimental Bio-data
Summary
The Rough Sets methodology has great potential for mining experimental data. Since its introduction by Pawlak, it has received a lot of attention in the computing community. However, due to the mathematical nature of the Rough Sets methodology, many experimental scientists lacking sufficient mathematical background have been hesitant to use it. The goal of this chapter is twofold: (1) to introduce “Rough Sets” methodology (along with one of its derivatives, “Modified Rough Sets”) in a non-mathematical fashion hoping to share the potentials of this approach with a larger group of non-computationally-oriented scientists (Mining of one specific form of implicit data within a bio-dataset is also discussed), and (2) to apply this methodology to a dataset of children with and without Attention Deficit/Hyperactivity Disorder (ADHD), to demonstrate the usefulness of the approach in patient differentiation. Discriminant Analysis statistical approach as well as the ID3 approach were also applied to the same dataset for comparison purposes to find out which approach is most effective.
Ray R. Hashemi, Alexander A. Tyler, Azita A. Bahrami
Integrating Local and Personalised Modelling with Global Ontology Knowledge Bases for Biomedical and Bioinformatics Decision Support
Summary
A novel ontology based decision support framework and a development platform are described, which allow for the creation of global knowledge representation for local and personalised modelling and decision support. The main modules are: an ontology module; and a machine learning module. Both modules evolve through continuous learning from new data. Results from the machine learning procedures can be entered back to the ontology thus enriching its knowledge base and facilitating new discoveries. This framework supports global, local and personalised modelling. The latter is a process of model creation for a single person, based on their personal data and the information available in the ontology. Several methods for local and personalised modelling, both traditional and new, are described. A case study is presented on brain-gene-disease ontology, where a set of 12 genes related to central nervous system cancer are revealed from existing data and local profiles of patients are derived. Through ontology analysis, these genes are found to be related to different functions, areas, and other diseases of the brain. Two other case studies discussed in the paper are chronic disease ontology and risk evaluation, and cancer gene ontology and prognosis.
Nikola Kasabov, Qun Song, Lubica Benuskova, Paulo Gottgtroy, Vishal Jain, Anju Verma, Ilkka Havukkala, Elaine Rush, Russel Pears, Alex Tjahjana, Yingjie Hu, Stephen MacDonell

Computational Intelligence in Biomedicine

Frontmatter
Data-Mining of Time-Domain Features from Neural Extracellular Field Data
Summary
Spike-wave and polyspike-wave activity in electroencephalogram are waveforms typical of certain epileptic states. Automated detection of such patterns would be desirable for automated seizure detection in both experimental and clinical venues. We have developed a time-domain algorithm denominated SPUD to facilitate data-mining of large electroencephalogram/electrocorticogram datasets to identify the occurrence of spike-wave or other activity patterns. This algorithm feeds into our enhanced Neural Query System [2, 12] database application to facilitate data-mining. We have used our algorithm to identify and classify activity from both simulated and experimental seizures.
Samuel Neymotin, Daniel J. Uhlrich, Karen A. Manning, William W. Lytton
Analysis of Spectral Data in Clinical Proteomics by Use of Learning Vector Quantizers
Summary
Clinical proteomics based on mass spectrometry has gained tremendous visibility in the scientific and clinical community. Machine learning methods are keys for efficient processing of the complex data. One major class are prototype based algorithms. Prototype based vector quantizers or classifiers are intuitive approaches realizing the principle of characteristic representatives for data subsets or decision regions between them. Examples for such tools are Support Vector Machines (SVM) [1], Kohonens Learning Vector Quantization (LVQ) [2], Self-Organizing Map (SOMs) [2], Supervised Relevance Neural Gas (SRNG) [3] and respective variants. Depending on the task one can distinguish between unsupervised methods for data representation and supervised methods for classification. New developments include the utilization of non-standard metrics (functional norms, scaled Euclidean) and task-dependent automatic metric adaptation (feature selection), fuzzy classification, and similarity based visualization of data. These properties offer new possibilities for analysis of mass spectrometric data. In this contribution we concentrate on recent extensions of SOMs as universal tools in the light of clinical proteomics. We focus on non-standard metrics and biomarker patterns discovery. We consider extensions of the standard SOM and LVQ for handling of more general metrics. In particular, we demonstrate applications of the weighted Euclidean metric and the weighted functional norm (based on weighted L p -norm) or kernelized metrics taking the specific nature of mass-spectra into account. This allows an efficient feature selection, which may be used for biomarker identification. The adaptation of the algorithms to these specific requirements leads to effective tools for knowledge discovery keeping the robustness of the original simple approaches. Further we consider fuzzy classification and regression within the determination of clinical proteomics models. This topic deals with the widely ranged problem of uncertainty of data. Particularly in medicine, the classification of mass spectra may be subject of individual human assessment (based on some expert knowledge), multi-impairment diseases, and incomplete patient/proband information. This leads to the problem of uncertainty of training data in machine learning data bases. We developed a semi-supervised approach based on SOM to process such data. As a result the algorithm provides a fuzzy classification scheme based on prototypes for classification of spectra (Fuzzy Labeled SOM - FLSOM).
We demonstrate the usefulness of the above extensions of the basic prototype based data analysis by SOMs to the analysis of mass spectra in proteomics and related knowledge discovery. In particular, we give application examples for biomarker detection based on feature selection and fuzzy classification of spectra combined with similarity based class visualization.
Frank-Michael Schleif, Thomas Villmann, Barbara Hammer, Martijn van der Werff, A. Deelder, R. Tollenaar
Computational Intelligence Techniques in Image Segmentation for Cytopathology
Summary
A variety of computational intelligence approaches to nuclei segmentation in the microscope images of fine needle biopsy material is presented in this chapter. The segmentation is one of the most important steps of the automatic medical diagnosis based on the analysis of the microscopic images, and is crucial to making a correct diagnostic decision. Due to complex nature of biological images, standard segmentation methods are not effective enough. In this chapter we present and discuss some modified versions of watershed algorithm, active contours, cellular automata, GrowCut technique, as well as new approaches like fuzzy sets of I and II type, and the sonar-like method.
Andrzej Obuchowicz, Maciej Hrebień, Tomasz Nieczkowski, Andrzej Marciniak
Curvature Flow Based 3D Surface Evolution Model for Polyp Detection and Visualization in CT Colonography
Summary
Computerized Tomography (CT) colonography is an emerging noninvasive technique for screening and diagnosing colon cancers. Since colonic polyps grow outward from the colon wall, they are modeled as protrusion shapes. In this chapter, we propose a novel anisotropic 3D surface evolution model for detecting protrusion shape based colonic polyp on the curved surface. The important feature of the proposed model is that it can detect protrusions with both convex and concave shapes. Protrusion shapes are defined as the extension beyond the usual limits or above a plane surface. Based on Gaussian and mean curvature flows, the approach works by locally deforming the convex or concave surface until the second principal curvature goes to zero. The diffusion directions are changed to prevent convex surfaces from converting into concave shapes, and vice versa. The deformation field quantitatively measures the amount of protrudeness. We also designed a new color coding scheme for better visualization of the detected polyps. The proposed method has been evaluated by using synthetic phantoms and real colon datasets.
Dongqing Chen, Aly A. Farag, M. Sabry Hassouna, Robert L. Falk, Gerald W. Dryden
Assisting Cancer Diagnosis with Fuzzy Neural Networks
Summary
Cancer diagnosis from huge microarray gene expression data is an important and challenging bioinformatics research topic. We used a fuzzy neural network (FNN) proposed earlier for cancer classification. This FNN contains three valuable aspects i.e., automatically generating fuzzy membership functions, parameter optimization, and rule-base simplification. One major obstacle in microarray data set classifier is that the number of features (genes) is much larger than the number of objects. We therefore used a feature selection method based on t-test to select more significant genes before applying the FNN. In this work we used three well-known microarray databases, i.e., the lymphoma data set, the small round blue cell tumor (SRBCT) data set, and the ovarian cancer data set. In all cases we obtained 100% accuracy with fewer genes in comparison with previously published results. Our result shows the FNN classifier not only improves the accuracy of cancer classification problem but also helps biologists to find a better relationship between important genes and development of cancers.
Feng Chu, Wei Xie, Farideh Fazayeli, Lipo Wang
Computational Intelligence in Clinical Oncology: Lessons Learned from an Analysis of a Clinical Study
Summary
In this chapter, we present a retrospective clinical study where the adoption of computational intelligence approaches for performing knowledge extraction from gene expression data enabled an improved oncological clinical analysis. This study focuses on a survival analysis of estrogen receptor (ER) positive breast cancer patients treated with tamoxifen. The chapter describes each step of the gene expression data analysis procedure, from the quality control of data to the final validation going through normalization, feature transformation, feature selection, and model building. Each section proposes a set of guidelines and motivates the specific choice made for this particular study. Finally, the main guidelines that emerged from this study are the use of simple and effective techniques rather than complex non-linear models, the use of interpretable methods and the use of scalable computational solutions able to deal with multiplatform and multisource data.
B. Haibe-Kains, C. Desmedt, S. Loi, M. Delorenzi, C. Sotiriou, G. Bontempi

Computational Intelligence in Bioinformatics

Frontmatter
Artificial Immune Systems in Bioinformatics
Summary
Artificial Immune Systems (AIS) represent one of the most recent and promising approaches in the branch of bio-inspired techniques. Although this open field of research is still in its infancy, several relevant results have been achieved by using the AIS paradigm in demanding tasks such as the ones coming from computational biology and biochemistry. The chapter will show how AIS have been successfully used in computational biology problems and will give readers further hints about possible implementations in unexplored fields. The main goal of the contribution lays in providing both theoretical foundations and hands-on experience that allow researchers to figure out novel applications of AIS in bioinformatics and, at the same time, providing researchers with necessary insights for implementation in daily research. The contribution will be organised in 5 sections.
Vitoantonio Bevilacqua, Filippo Menolascina, Roberto T. Alves, Stefania Tommasi, Giuseppe Mastronardi, Myriam Delgado, Angelo Paradiso, Giuseppe Nicosia, Alex A. Freitas
Evolutionary Algorithms for the Protein Folding Problem: A Review and Current Trends
Abstract
Proteins are complex macromolecules that perform vital functions in all living beings. They are composed of a chain of amino acids. The biological function of a protein is determined by the way it is folded into a specific tri-dimensional structure, known as native conformation. Understanding how proteins fold is of great importance to Biology, Biochemistry and Medicine. Considering the full analytic atomic model of a protein, it is still not possible to determine the exact tri-dimensional structure of real-world proteins, even with the most powerful computational resources. To reduce the computational complexity of the analytic model, many simplified models have been proposed. Even the simplest one, the bi-dimensional Hydrophobic-Polar (2D-HP) model (see Sect. 12.2.2), was proved to be intractable due to its NP-completeness. The current approach for studying the structure of proteins is the use of heuristic methods that, however, do not guarantee the optimal solution. Evolutionary computation techniques have been proved to be efficient for many engineering and computer science problems. This is also the case of unveiling the structure of proteins using simple lattice models.
Heitor Silvério Lopes
Flexible Protein Folding by Ant Colony Optimization
Summary
Protein structure prediction is one of the most challenging topics in bioinformatics. As the protein structure is found to be closely related to its functions, predicting the folding structure of a protein to judge its functions is meaningful to the humanity. This chapter proposes a flexible ant colony (FAC) algorithm for solving protein folding problems (PFPs) based on the hydrophobic-polar (HP) square lattice model. Different from the previous ant algorithms for PFPs, the pheromones in the proposed algorithm are placed on the arcs connecting adjacent squares in the lattice. Such pheromone placement model is similar to the one used in the traveling salesmen problems (TSPs), where pheromones are released on the arcs connecting the cities. Moreover, the collaboration of effective heuristic and pheromone strategies greatly enhances the performance of the algorithm so that the algorithm can achieve good results without local search methods. By testing some benchmark two-dimensional hydrophobic-polar (2D-HP) protein sequences, the performance shows that the proposed algorithm is quite competitive compared with some other well-known methods for solving the same protein folding problems.
Xiao-Min Hu, Jun Zhang, Yun Li
Considering Stem-Loops as Sequence Signals for Finding Ribosomal RNA Genes
Summary
Several factors make stem-loops an attractive sequence signal for a structural RNA gene-finder. Structural RNAs are virtually obligated to form stem-loops on their way to forming stable structures. Also, stem-loops can be identified along a sequence of length n in O(n) time. We postulate that stem-loops found in structural RNA genes may tend to be longer than those found in their genomic counterparts - coding sequences and noncoding DNA. We also postulate that stem-loops may occur in higher frequency in the structural RNA regions.
Methods: To examine these possibilities, rRNAs were selected as a test bed. An algorithm was developed to identify stem-loops along a genomic sequence which are similar to those found in rRNA secondary structures. This algorithm scanned the genomes in our training set to establish average metric values observed in rRNA genes. These values were subsequently used in an effort to identify rRNA genes in genomes outside of the training set.
Results: The values for the stem-loop metrics we tested are sensitive to G+C content. Two of the metrics reported here are able to identify rRNA genes when there is a marked difference in G+C content between rRNAs and their genomic counterparts. Another metric has demonstrated an ability to roughly target rRNA genes when there is a negligible difference in G+C content levels.
Conclusions: Our results are encouraging and demonstrate that stem-loops have the potential to act as sequence signals to discover rRNA genes. Our results also suggest that more study into stem-loops is warranted to further improve the performance of our algorithm and to examine the application to a wider population of structural RNA genes.
Kirt M. Noël, Kay C. Wiese
Power-Law Signatures and Patchiness in Genechip Oligonucleotide Microarrays
Summary
Genechip oligonucleotide microarrays have been used widely for transcriptional profiling of a large number of genes in a given paradigm. Gene expression estimation precedes biological inference and is given as a complex combination of atomic entities on the array called probes. These probe intensities are further classified into perfect-match (PM) and mismatch (MM) probes. While former is a measure of specific binding, the latter is a measure of non-specific binding. The behavior of the MM probes has especially proven to be elusive. The present study investigates qualitative similarities in the distributional signatures and local correlation structures/patchiness between the PM and MM probe intensities. These qualitative similarities are established on publicly available microarrays generated across laboratories investigating the same paradigm. Persistence of these similarities across raw as well as background subtracted probe intensities is also investigated. The results presented raise fundamental concerns in interpreting Genechip oligonucleotide microarray data.
Radhakrishnan Nagarajan
Case Study: Structure and Function Prediction of a Protein with No Functionally Characterized Homolog
Summary
The post-genomic era has seen a significant increase in the use of computational prediction methods to gain insights into structure and function of proteins. Prediction tools are used to guide the experimental design to test various hypotheses about structure and function of known proteins. However, these tools are particularly useful when studying putative protein sequences with no known function. The genomic era produced a large number of sequences that are described as either hypothetical proteins or as proteins with unknown function. Current molecular biology techniques are not adequate to efficiently study this vast reservoir of genetic information. However, computer algorithms can process large amounts of sequence data to predict structure and function. These knowledge-based computational tools use available experimental data and are regularly updated to improve their predictive power. The simplest form of function prediction is achieved by comparison of the query sequence to all available sequences using BLAST. If the query sequence is highly similar to previously characterized proteins, then it is likely that the query sequence has similar functions. However, if the query sequence does not have any homologous sequence with known function, then more sophisticated computational tools are necessary to gain insight into structure and function. Various methods have been developed to search for known domains, motifs, patterns, or profiles. The quality of predictions is dependent on the type of tools used and is limited to the closeness of the query sequence to known proteins.
In this chapter, we will describe and discuss methods and tools we used to predict structure and function of a putative protein sequence (Msa) with unknown function. We will address the advantages and limitations of all these approaches by using the Msa protein from the human pathogen Staphylococcus aureus as a case study. Msa is a novel protein that is involved in regulation of virulence. Since Msa has no known homolog, computational tools are being used to predict its structure and mechanism of action. These predictions are used to design experiments to study Msa and explore its use as a therapeutic target to combat antibiotic-resistant infections.
Vijayaraj Nagarajan, Mohamed O. Elasri
From Biomedical Literature to Knowledge: Mining Protein-Protein Interactions
Summary
To date, more than 16 million citations of published articles in biomedical domain are available in the MEDLINE database. These articles describe the new discoveries which accompany a tremendous development in biomedicine during the last decade. It is crucial for biomedical researchers to retrieve and mine some specific knowledge from the huge quantity of published articles with high efficiency. Researchers have been engaged in the development of text mining tools to find knowledge such as protein-protein interactions, which are most relevant and useful for specific analysis tasks. This chapter provides a road map to the various information extraction methods in biomedical domain, such as protein name recognition and discovery of protein-protein interactions. Disciplines involved in analyzing and processing unstructured-text are summarized. Current work in biomedical information extracting is categorized. Challenges in the field are also presented and possible solutions are discussed.
Deyu Zhou, Yulan He, Chee Keong Kwoh
Backmatter
Metadaten
Titel
Computational Intelligence in Biomedicine and Bioinformatics
herausgegeben von
Tomasz G. Smolinski
Mariofanna G. Milanova
Aboul-Ella Hassanien
Copyright-Jahr
2008
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-70778-3
Print ISBN
978-3-540-70776-9
DOI
https://doi.org/10.1007/978-3-540-70778-3

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.