Skip to main content
main-content

Über dieses Buch

The Springer Handbook of Bio-/Neuro-Informatics is the first published book in one volume that explains together the basics and the state-of-the-art of two major science disciplines in their interaction and mutual relationship, namely: information sciences, bioinformatics and neuroinformatics. Bioinformatics is the area of science which is concerned with the information processes in biology and the development and applications of methods, tools and systems for storing and processing of biological information thus facilitating new knowledge discovery. Neuroinformatics is the area of science which is concerned with the information processes in biology and the development and applications of methods, tools and systems for storing and processing of biological information thus facilitating new knowledge discovery.
The text contains 62 chapters organized in 12 parts, 6 of them covering topics from information science and bioinformatics, and 6 cover topics from information science and neuroinformatics. Each chapter consists of three main sections: introduction to the subject area, presentation of methods and advanced and future developments. The Springer Handbook of Bio-/Neuroinformatics can be used as both a textbook and as a reference for postgraduate study and advanced research in these areas. The target audience includes students, scientists, and practitioners from the areas of information, biological and neurosciences.

With Forewords by Shun-ichi Amari of the Brain Science Institute, RIKEN, Saitama and Karlheinz Meier of the University of Heidelberg, Kirchhoff-Institute of Physics and Co-Director of the Human Brain Project.

Inhaltsverzeichnis

Frontmatter

1. Understanding Nature Through the Symbiosis of Information Science, Bioinformatics, and Neuroinformatics

This chapter presents some background information, methods, and techniques of information science, bio- and neuroinformatics in their symbiosis. It explains the rationale, motivation, and structure of the Handbook that reflects on this symbiosis. For this chapter, some text and figures from [

1.1

] have been used. As the introductory chapter, it gives a brief overview of the topics covered in this

Springer Handbook of Bio-/Neuroinformatics

with emphasis on the symbiosis of the three areas of science concerned: information science (informatics) (IS), bioinformatics (BI), and neuroinformatics (NI). The topics presented and included in this Handbook provide a far from exhaustive coverage of these three areas, but they clearly show that we can better understand nature only if we utilize the methods of IS, BI, and NI, considering their integration and interaction.

Nikola Kasabov

Understanding Information Processes in Biological Systems

Frontmatter

2. Information Processing at the Cellular Level: Beyond the Dogma

The

information

processing

classical view of information flow within a cell, encoded by the famous

central dogma of molecular biology

, states that the instructions for producing amino acid chains are read from specific segments of DNA, just as computer instructions are read from a tape, transcribed to informationally equivalent RNA molecules, and finally

executed

by the cellular machinery responsible for synthesizing proteins. While this has always been an oversimplified model that did not account for a multitude of other processes occurring inside the cell, its limitations are today more dramatically apparent than ever. Ironically, in the same years in which researchers accomplished the unprecedented feat of decoding the complete genomes of higher-level organisms, it has become clear that the information stored in DNA is only a small portion of the total, and that the overall picture is much more complex than the one outlined by the dogma.

The cell is, at its core, an information processing machine based on molecular technology, but the variety of types of information it handles, the ways in which they are represented, and the mechanisms that operate on them go far beyond the simple model provided by the dogma. In this

chapter we provide an overview of the most important aspects of information processing that can be found in a cell, describing their specific characteristics, their role, and their interconnections. Our goal is to outline, in an intuitive and nontechnical way, several different views of the cell using the language of information theory.

Alberto Riva

3. Dielectrophoresis: Integrated Approaches for Understanding the Cell

The complex permittivity of a biological cell

biological cell

reflects its substance and structure and thus seems to reflect its function, activity, abnormality, life/death, age, and life expectancy. Although it may be very difficult to measure the complex permittivity of each cell, the movement or behavior of the cell as affected by its complex permittivity can be observed under the

complex

permittivity

microscope. The

dielectrophoretic force

(DEP force) generated on a particle in a nonuniform electric field causes movement of the particle in

nonuniform electric field

accordance with its complex permittivity or polarizing characteristics. Thus, differences in the substance or structure of biological cells lead to differences in their movement or behavior in a nonuniform electric field. The principle of dielectrophoresis (DEP)

dielectrophoresis (DEP)

and the estimation of the DEP force are described in this chapter. The distinctive features of DEP are applied in the separation of biological cells,

separation

of biological cells

e.g., leukocytes from erythrocytes, leukemia cells from normal leukocytes. This cell separation ability is affected by the frequency and amplitude of the applied voltage. To estimate the DEP force generated on a single cell, the terminal velocity of the cell in the medium should be measured withouttaking it out of the DEP device. The procedure to measure the terminal velocity is also described.

Takeshi Yamakawa, Hiroko Imasato

4. Information Processing at the Genomics Level

A central objective in biology is to identify and characterize the mechanistic underpinnings (e.g., gene, protein interactions) of a biological phenomenon (e.g., a phenotype). Today, it is technologically feasible and commonplace to measure a great number of biomolecular features in a biological system at once, and to systematically investigate relationships between the former and the latter phenotype or phenomenological feature of interest across multiple spatial and temporal scales. The canonical starting point for such an investigation is typically a real number valued data matrix of

N

genomic features × 

M

sample features, where

N

and

M

are integers, and

N

is often orders of magnitude greater than

M

. In this chapter we describe and rationalize the broad concepts and general principles underlying the analytic steps that start from this data matrix and lead to the identification of coherent mathematical patterns in the data that represent potential and testable mechanistic associations. A key challenge in this analysis is how one deals with false positives that largely arise from the high dimensionality of the data. False positives are mathematical patterns that are not coherent (from a technical or statistical

mathematical

pattern

standpoint) or coherent patterns that do not correspond to a true mechanistic association (from a biological standpoint).

Alvin T. Kho, Hongye Liu

5. Understanding Information Processes at the Proteomics Level

All living organisms are composed of proteins. Proteins are large, complex molecules made of long chains of amino acids. Twenty different amino acids are usually found in proteins. Proteins are produced on protein-synthesizing machinery directed by codons made of three deoxyribonucleic acid (DNA)

deoxyribonucleic acid (DNA)

bases. DNA is an information storage macromolecule.

information storage

macromolecule

With the fast advancement of DNA sequencing technology, more and more genomes have been sequenced. Sequence analysis of this exploding genomic information has revealed a lot of novel genes for which molecular and/or biological functions are to be determined. The huge genomic information stored in DNA and genes is stationary and heritable. At cellular level, genomic information flows selectively from DNA to messenger RNA (mRNA)

messenger RNA (mRNA)

through transcription and from mRNA to proteins through translation for biological functions, such as response to changes in the environment. Different large-scale, high-throughput studies have been performed to investigate the information flow, e.g., transcriptomic profiling using microarray or RNAseq technologies. As a complementary approach to genomics

genomics

and transcriptomics,

transcriptomics

proteomics

proteomics

has been fast developing to investigate gene expression at protein levels including quantitative changes, posttranslational modifications, and interactions with other molecules. These protein-level events

protein

-level event

represent a global view of information processing at the proteomics level. In this chapter, we focus on the description of technological and biological aspects of the information flow from the static ge nome

static genome

to the dynamic proteome through gene transcription, protein translation, posttranslational modification, and protein

protein

interaction

interactions.

Shaojun Dai, Sixue Chen

6. Pattern Formation and Animal Morphogenesis

pattern

formation

animal

morphogenesis

morphogenesis

animal

The millions of species of animals on Earth can be divided into only about 35 phyla based on underlying morphology. Animal bodies are constructed using a small set of structural motifs that, as 19th century embryologists recognized, can be generated spontaneously by nonliving physicochemical processes. The discovery of genes early in the 20th century, and of their molecular identity a few decades later, led to the view that morphology is a consequence of patterned gene expression during development. Advances in mathematical theory and numerical methods in the second half of the 20th

development

century have made it possible to analyze, classify, and simulate patterns that emerge spontaneously in nonlinear dynamical systems.

The body of this chapter is in three sections. The first section (Sect.

6.1

) introduces mathematical models and methods of dynamical systems theory. Section

6.2

explains principles and mechanisms of dynamical pattern formation using this theory, while Sect.

6.3

discusses the possible role of these mechanisms in the evolution and development of animal morphology. The

evolution

mathematical notation is loose and the presentation avoids technicalities, in order to make the chapter more accessible to its intended audience: biologists who have not yet mastered nonlinear dynamical systems theory, and mathematical engineers and physicists seeking opportunities to apply their skills in biology.

macromolecular reaction network

The theory shows that macromolecular reaction networks are capable in principle of generating a larger class of patterns than actually occurs. This raises an interesting puzzle: Why do developmental genes only build structures that could build themselves? The question lies at the heart of

evo-devo

, an emerging scientific program that aims to synthesize evolutionary molecular

evo-devo

dynamic

model

biology and developmental mechanics. Dynamical models suggest that metazoan developmental genes may have evolved not as generators of morphology, but to stabilize and coordinate self-organizing mechanical and physicochemical processes. Simple simulations show how molecular patterns that now presage anatomical patterns in development may have been a consequence rather than a cause of those patterns in early animal evolution.

Michael G. Paulin

7. Understanding Evolving Bacterial Colonies

Microbial colonies are collections of cells of the same organism (in contrast to biofilms, which comprise multiple species). Within an evolving colony,

evolving colony

cells communicate, pass information to their daughters, and assume roles that depend on their spatiotemporal distribution. Thus, they possess a collective intelligence which renders them model systems for studying biocomplexity. Since the early 1990s, a plethora of models have been proposed to investigate and understand bacterial colonies. The majority of these are based on continuum equations incorporating physical and biological phenomena, such as chemotaxis, bacterial diffusion,

bacterial diffusion

nutrient diffusion

nutrient

diffusion

and consumption, and cellular reproduction.

cellular

reproduction

Continuum approaches have greatly advanced our knowledge of the likely drivers of colony evolution, but are limited by the fact that diverse methods yield the same or similar solutions. Some researchers have turned instead to agent-based, heuristic approaches,

heuristic approach

which provide a natural description of complex systems.

complex

system

Yet others have recognized that chemotaxis

chemotaxis

constitutes an optimization problem, as bacteria weigh nutrient requirement against competition and energy expenditure. This chapter begins with a brief introduction to bacterial colonies and why they have attracted research interest. The experiments on which many of the published models have been based, and the modeling approaches used, are discussed (Sect.

7.1

). In Sect.

7.2

a wide cross-section of published models for comparison and contrast is presented. Limitations of existing models are discussed in Sects.

7.3

7.7

, and the chapter concludes with current and future trends in this important research area (Sect.

7.8

).

Leonie Z. Pipe

Molecular Biology, Genome and Proteome Informatics

Frontmatter

8. Exploring the Interactions and Structural Organization of Genomes

Bioinformatics typically treats genomes as linear DNA sequences, with features annotated upon them. In the nucleus, genomes are arranged in space to form three-dimensional structures at several levels. The three-dimensional organization of a genome contributes to its activity, affecting the accessibility and regulation of its genes, reflecting the particular cell type and the epigenetic state

epigenetic

state

of the cell.

The majority of the cell cycle occurs during interphase. During metaphase

metaphase

and meiosis,

meiosis

chromosomes are highly condensed. By contrast, interphase chromosomes are difficult to visualize by direct microscopy. Several attempts have been made to understand the nature of metaphase chromosomes and genome structures. Approaches to indirectly derive the spatial proximity of portions of a genome have been devised and applied (Fig.

8.1

, Table

8.1

).

Fig. 8.1

Chromosomes are organized at a variety levels, several of which can bring portions of the genome that are sequentially separated into close spatial proximity, as introduced in Sect.

8.1

(after

Fraser

et al. [

8.1

])

ChIP-PET

3C

4C

5C

Hi-C

Cross-link the complexes

Cross-link chromatin using formaldehyde

Sonicate to shear DNA

Cleave DNA with restriction enzyme (e.g.,

Hin

dIII)

Immunoprecipitate complexes

Fill in sticky ends with biotin-labeled DNA

Ligate

Ligate, using conditions that favor ligation (joining) of DNA ends from the same cross-linked complex, not between different cross-linked complexes.

Prepare for sequencing, and purify DNA

Linearize DNA

Circularize DNA, LMA

Sonciate to shear DNA; purify using streptavidin beads

Sequencing

Direct sequencing

PCR

Inverse PCR

Sequence

copies

End-pair sequencing

Table 8.1

Outline of the experimental protocols (Fig.

8.1

). An outline of the steps taken to prepare sequence data representing regions of spatial proximity.

spatial

proximity

This table should be read with Fig.

8.1

and Sect.

8.2

. Some of the methods can also be assessed using (so-called)

gene

-array profiling

profiling

gene-array

gene-array profiling; for simplicity only the sequencing options are presented here

This chapter reviews these approaches briefly and examines early methods used to investigate the structure of a genome from these data. This research involves taking experimental data, processing them with variants of existing bioinformatic DNA sequence

bioinformatic DNA sequence

DNA sequence

bioinformatic

analyses, then analyzing the proximity data derived using biophysical approaches. This chapter emphasizes the background to the biological science and the latter, biophysics-oriented, analyses. The processing of the genomic data is outlined only briefly, as these approaches draw on established bioinformatic methods covered elsewhere in this Handbook. The main focus is on the methods used to derive three-dimen sional (3-D) structural information

structural

information

from the interaction data.

Grant H. Jacobs

9. Detecting MicroRNA Signatures Using Gene Expression Analysis

Small RNAs such as microRNAs (miRNAs)

microRNA (miRNA)

have been shown to play important roles in genetic regulation of plants and animals. In particular, the miRNAs of animals are capable of downregulating large numbers of genes by binding to and repressing target genes. Although large numbers of miRNAs have been cloned and sequenced, methods for analyzing their targets are far from perfect. Methods exist that can predict the likely binding sites of miRNAs in target transcripts using sequence alignment, thermodynamics or machine learning approaches. It has been widely illustrated that such de novo computational approaches suffer from high false-positive and false-negative error rates. In particular these approaches do not take into account expression information regarding the miRNA or its target transcript. In this chapter we describe the use of miRNA seed enrichment analysis approaches to this problem. In cases where gene or protein expression data are available, it is possible to detect the signature of miRNA binding events by looking for enrichment of microRNA seed binding motifs in sorted gene lists. In this chapter we introduce the concept of miRNA target analysis, the background to motif enrichment analysis, and a number of programs designed for this purpose. We focus on the Sylamer algorithm

Sylamer

algorithm

algorithm

Sylamer

for miRNA seed enrichment analysis

miRNA

seed enrichment analysis

and its applications for miRNA target discovery

miRNA

target discovery

with examples from real biological datasets.

Stijn van Dongen, Anton J. Enright

10. Bioinformatic Methods to Discover Cis-regulatory Elements in mRNAs

regulatory element

Cis

-regulatory elements play a number of important roles in determining the fate of messenger RNAs (mRNAs). Due to these elements, mRNAs may be translated with remarkable efficiency, or destroyed with little translation. Untranslated regions cover over a third of a typical human mRNA and often contain a range of regulatory elements. Some elements along with their RNA or protein binding partners are well characterized, though many are not. These require different types of bioinformatic methods for identification and discovery. The most successful techniques combine a range of information and search strategies. Useful information may include conservation across species, prior biological knowledge, known false positives, or noisy high-throughput experimental data. This chapter focuses on current successful methods designed to discover elements with high sensitivity but low false-positive rates.

Stewart G. Stevens, Chris M. Brown

11. Protein Modeling and Structural Prediction

protein

genetic information

Proteins perform crucial functions in every living cell. The genetic information in every organismʼs DNA encodes the proteinʼs amino acid sequence, which determines its three-dimensional structure, which, in turn, determines its function. In this postgenomic era, protein sequence information can be obtained relatively easily through experimental means. Sequence databases already contain millions of protein sequences and continue to grow. Structural information, however, is harder to obtain through experimental means – we currently know the structure of about 75000 proteins. Knowledge of a proteinʼs structure is extremely useful in understanding its molecular function and in developing drugs that bind to it. Thus, computational techniques have been developed to bridge the ever-increasing gap between the number of known protein sequences and structures.

In addition to proteins in general, this chapter discusses the specific importance of membrane proteins, which make up about one-third of all known proteins. Membrane proteins control communication and transport into and out of every living cell and are involved in many medically important processes. Over half of current drug targets are membrane proteins.

protein

sequence

A brief introduction to protein sequence and structure is followed by an overview of common techniques

computational

protein structure prediction

protein structure

computational prediction

protein structure

prediction

used in the process of computational protein structure prediction. Emphasis is put on two particularly

protein structure

protein

loop modeling

protein

modeling

membrane

protein

protein

membrane

hard problems, namely protein loop modeling and the structural prediction of membrane proteins.

Sebastian Kelm, Yoonjoo Choi, Charlotte M. Deane

Machine Learning Methods for the Analysis, Modeling and Knowledge Discovery from Bioinformatics Data

Frontmatter

12. Machine Learning Methodology in Bioinformatics

Machine learning plays a central role in the interpretation of many datasets generated within the biomedical sciences. In this chapter we focus on two core topics within machine learning,

supervised

and

unsupervised

learning, and

learning

supervised

machine learning

illustrate their application to interpreting these datasets. For supervised learning, we focus on

support vector machines

(SVMs), which is a subtopic of

kernel-based learning

. Kernels can be used to encode

kernel-based

learning

learning

kernel-based

many different types of data, from continuous and discrete data through to graph and sequence data. Given the different types of data encountered within bioinformatics, they are therefore a method of choice within this context. With

unsupervised learning

we

unsupervised

learning

learning

unsupervised

are interested in the discovery of structure within data. We start by considering hierarchical cluster analysis (HCA), given its

hierarchical

cluster analysis (HCA)

common usage in this context. We then point out the advantages of

Bayesian

approaches to unsupervised learning, such as a principled approach to model selection (how many clusters are present in the data) through to confidence measures for assignment of datapoints to clusters. We outline five case studies illustrating these methods. For supervised learning we consider prediction of disease progression in cancer and protein fold prediction. For unsupervised learning we apply HCA to a small colon cancer dataset and then illustrate the use of Bayesian unsupervised learning applied to breast and lung cancer datasets. Finally we consider

network inference

, which can be approached as an unsupervised or supervised learning task

network

inference

depending on the data available.

Colin Campbell

13. Case-Based Reasoning for Biomedical Informatics and Medicine

Case-based reasoning (CBR) is an integral part of artificial intelligence. It is defined as the process of solving new

case-based reasoning (CBR)

problems through their comparison with similar ones with existing solutions. The CBR methodology fits well with the approach that healthcare workers take when presented with a new case, making its incorporation into a clinical setting natural. Overall, CBR is appealing in medical domains because a case base already exists, storing symptoms, diagnoses, treatments, and outcomes for each patient. Therefore, there are several CBR systems for medical diagnosis and decision support. This chapter gives an overview of CBR systems, their lifecycle, and different settings in which they appear. It also discusses major applications of CBR in the biomedical field, the methodologies used, and the systems that have been adopted. Section

13.1

provides the necessary background of CBR, while Sect.

13.2

gives an overview of techniques. Section

13.3

presents different systems in which CBR has been successfully applied, and Sect.

13.4

presents biomedical applications. A concluding discussion closes the chapter in Sect.

13.5

.

Periklis Andritsos, Igor Jurisica, Janice I. Glasgow

14. Analysis of Multiple DNA Microarray Datasets

clustering

algorithm

algorithm

clustering

In contrast to conventional clustering algorithms, where a single dataset is used to produce a clustering solution, we introduce herein a MapReduce approach for clustering of datasets generated in multiple-experiment settings. It is inspired by the

map-reduce

functions commonly used in

MapReduce

approach

functional programming and consists of two distinctive phases. Initially, the selected clustering algorithm is applied (mapped) to each experiment separately. This produces a list of different clustering solutions, one per experiment. These are further transformed (reduced) by portioning the cluster centers into a single clustering solution. The obtained partition is not disjoint in terms of the different participating genes, and it is further analyzed and refined by applying formal concept

formal concept analysis (FCA)

analysis.

Veselka Boeva, Elena Tsiporkova, Elena Kostadinova

15. Fuzzy Logic and Rule-Based Methods in Bioinformatics

This chapter reviews some fuzzy logic and rule-based approaches in bioinformatics. Among the fuzzy approaches, we emphasize fuzzy neural networks (FNN), which have advantages from both fuzzy logic (e.g., linguistic rules and reduced computation) and neural

fuzzy

neural network (FNN)

networks (e.g., ability to learn from data and universal approximation). After the overview in Sect.

15.1

, the structure and algorithm of the FNN are reviewed in Sect.

15.2

. In Sect.

15.3

, we describe a

t

-test-based gene importance ranking method followed by a description of how we use the FNN to classify three important microarray datasets, for lymphoma, small round blue cell tumor (SRBCT), and ovarian cancer (Sect.

15.4

). Section

15.5

reviews various fuzzy and rule-based approaches to microarray data classification proposed by other authors, while Sect.

15.6

reviews fuzzy and rule-based approaches to clustering and prediction in microarray data. We discuss and draw some conclusions in Sect.

15.7

.

Lipo Wang, Feng Chu, Wei Xie

16. Phylogenetic Cladograms: Tools for Analyzing Biomedical Data

This chapter provides an introduction to phylogenetic cladograms – a systems biology

systems biology

evolutionary-based computational methodology

evolutionary

-based computational methodology

that emphasizes the importance of considering multilevel heterogeneity in living systems when mining data related to these systems. We start by defining intelligence as the ability to predict, because prediction is a very important objective in mining data, especially biomedical data (Sect.

16.1

). We then give a brief review of artificial intelligence (AI)

artificial intelligence (AI)

and computational intelligence (CI)

computational intelligence (CI)

(Sects.

16.2

,

16.3

), provide a conciliatory overview of CI, and suggest that phylogenetic cladograms

phylogenetic

cladogram

which provide hypotheses about speciation and inheritance relationships should be considered to be a CI methodology.

methodology

CI

CI methodology

We then discuss heterogeneity in biomedical data and talk about data types, how statistical methods blur heterogeneity, and the different results obtained between more traditional CI methodologies (phenetic) and phylogenetic techniques.

phylogenetic

technique

phylogenetic

cladogram tree

Finally, we give an example of constructing and interpreting a phylogenetic cladogram tree.

Mones S. Abu-Asab, Jim DeLeo

17. Protein Folding Recognition

Protein folding recognition

protein

folding recognition

is a complex problem in bioinformatics where different structures of proteins are extracted from a large amount of harvested data including functional and genetic features of proteins. The data generated consist of thousands of feature vectors

feature

vector

with fewer protein sequences. In such a case, we need computational tools to analyze and extract useful information from the vast amount of raw data to predict the major biological functions of genes and proteins with respect to their structural behavior. In this chapter, we discuss the predictability of protein folds using a new hybrid approach for selecting features and classifying protein data using support vector machine (SVM)

support vector machine (SVM)

classifiers with quadratic discriminant analysis (QDA)

quadratic discriminant analysis (QDA)

and principal component analysis (PCA)

principal component analysis (PCA)

as generative classifiers to enhance the performance and accuracy. In one of the applied methods, we reduced the data dimensionality by using data reduction algorithms such as PCA. We compare our results with previous results cited in the literature and show that use of an appropriate feature selection technique is promising and can result in a higher recognition ratio compared with other competing methods proposed in previous studies. However, new approaches are still needed, as the problem is complex and the results are far from satisfactory. After this introductory section, the chapter is organized as follows: In Sect.

17.1

we discuss the problem of protein fold prediction, protein database, and its extracted feature vectors. Section

17.2

describes feature selection and classification using SVM and fused hybrid classifiers, while Sect.

17.4

presents the experimental results. Section

17.5

discusses experimental results, including conclusions and future work.

Lavneet Singh, Girija Chetty

18. Kernel Methods and Applications in Bioinformatics

The kernel technique is a powerful tool for constructing new pattern analysis methods. Kernel engineering provides a general approach to incorporating domain knowledge and dealing with discrete data structures. Kernel methods, especially the support vector machine (SVM), have been extensively applied in the bioinformatics field, achieving great successes. Meanwhile, the development of kernel methods has also been strongly driven by various challenging bioinformatic problems. This chapter aims to give a concise and intuitive introduction to the basic principles of the kernel technique, and demonstrate how it can be applied to solve problems with uncommon data types in bioinformatics. Section

18.1

begins with the product features to give an intuitive idea of kernel functions, then presents the definition and some properties of kernel functions, and then devotes a subsection to a brief review of kernel engineering and its applications to bioinformatics. Section

18.2

describes the standard SVM algorithm. Finally, Sect.

18.3

illustrates how kernel methods can be used to address the peptide identification and the protein homology prediction problems in bioinformatics, while Sect.

18.4

concludes.

Yan Fu

Modeling Regulatory Networks: The Systems Biology Approach

Frontmatter

19. Path Finding in Biological Networks

cellular

behavior

Understanding the cellular behavior from a systems perspective requires the identification of functional and physical interactions among diverse molecular entities in a cell (i.e., DNA/RNA, proteins, and metabolites). The most straightforward way to

molecular

network

network

molecular

represent such datasets is by means of molecular networks of which nodes correspond to molecular entities and edges to the interactions amongst those entities. Nowadays with large amounts of

genome-wide

network

network

genome-wide

interaction data being generated, genome-wide networks can be created for an increasing number of organisms. These networks can be exploited to study a molecular entity like a protein in a wider context than just in isolation and provide a way of representing our knowledge of the system as a whole. On the other hand, viewing a single entity or an experimental dataset in the light of an interaction network can reveal previous unknown insights in biological processes.

In this chapter we focus on different approaches that have been developed to reveal the functional

network

functional state

state of a network, or to find an explanation for the observations in functional data through paths in

omics

dataset

the network. In addition we give an overview of the different omics datasets and data-integration

biological

network

network

biological

techniques that can be used to build integrated biological networks.

Lore Cloots, Dries De Maeyer, Kathleen Marchal

20. Inferring Transcription Networks from Data

Reverse

transcription network

engineering

reverse engineering

of transcription networks

transcription network

is a challenging bioinformatics problem. Ordinary differential equation (ODEs) network models have their roots in the physicochemical base of these networks, but are difficult to build conventionally. Modeling automation is needed and knowledge discovery in data using computational intelligence methods is a solution. The authors have developed a methodology for automatically inferring ODE systems models from omics data,

omics

data

data

omics

based on genetic programming (GP),

genetic

programming (GP)

and illustrate it on a real transcription network. The methodology allows the network to be decomposed from the complex of interacting cellular networks and to further decompose each of its nodes, without destroying their interactions. The structure of the network is not imposed but discovered from data, and further assumptions can be made about the parametersʼ values and the mechanisms involved. The algorithms can deal with unmeasured regulatory variables, like transcription factors (TFs)

transcription factor (TF)

and microRNA (miRNA

microRNA (miRNA)

or miR). This is possible by introducing the

regulome probabilities

concept and the techniques to compute them. They are based on the statistical thermodynamics of regulatory molecular interactions. Thus, the resultant models are mechanistic and theoretically founded, not merely data fittings. To our knowledge, this is the first reverse engineering approach capable of dealing with missing variables, and the accuracy of all the models developed is greater than 99%.

Alexandru G. Floares, Irina Luludachi

21. Computational Methods for Analysis of Transcriptional Regulation

Understanding the mechanisms of transcriptional regulation is a key step in understanding

transcriptional

regulation mechanism

many biological processes. Many computational algorithms have been developed to tackle this problem by identifying (1) the binding motifs, (2) binding sites, and (3) regulatory targets of given transcription factors. In this chapter, we survey the scope of currently used methods and algorithms for solving each of the above subproblems. We also focus on the newer subarea of machine learning (ML) methods, which have introduced a framework for a new set of approaches to solving these problems. The connections between these machine learning algorithms and conventional position weight matrix (PWM)-based algorithms are also

position weight matrix (PWM)

highlighted, with the suggestion that ML algorithms can often generalize and expand the capabilities of existing methods.

Yue Fan, Mark Kon, Charles DeLisi

22. Inferring Genetic Networks with a Recurrent Neural Network Model Using Differential Evolution

In this chapter, we present an evolutionary approach for reverse-engineering gene regulatory networks (GRNs)

gene regulatory network (GRN)

network

gene regulatory (GRN)

from the temporal gene expression profile. The regulatory interaction among genes is modeled by the recurrent neural network (RNN)

recurrent neural network (RNN)

formalism. We used the differential evolution (DE) algorithm with a random restart strategy for inferring the underlying network structure as well as the regulatory parameters. The random restart mechanism is particularly useful for avoiding premature convergence and is hence expected to be valuable in recovering important regulations from noisy gene expression data. The algorithm has been applied for inferring regulation by analyzing gene expression data generated from both in silico and in vivo networks. We also investigate the effectiveness of the method in obtaining an acceptable network from a limited amount of noisy data.

Nasimul Noman, Leon Palafox, Hitoshi Iba

23. Structural Pattern Discovery in Protein–Protein Interaction Networks

Most proteins in a cell do not act in isolation, but carry out their function through interactions with other proteins. Elucidating these interactions is therefore central for our understanding of cellular function and organization. Recently, experimental techniques have been developed, which have allowed us to measure protein interactions on a genomic scale for several model organisms. These datasets have a natural representation as weighted graphs, also known as protein–protein interaction (PPI)

protein–protein interaction (PPI)

networks. This chapter will present some recent advances in computational methods for the analysis of these networks, which are aimed at revealing their structural patterns. In particular, we shall focus on methods for uncovering modules that correspond to protein complexes,

protein

complex

and on random graph models,

random graph

model

which can be used to de-noise large scale PPI networks. In Sect.

23.1

, the state-of-the-art techniques and algorithms are described followed by the definition of measures to assess the quality of the predicted complexes and the presentation of a benchmark of the detection algorithms on four PPI networks. Section

23.2

moves beyond protein complexes and explores other structural patterns of protein–protein interaction networks using random graph models.

Tamás Nepusz, Alberto Paccanaro

24. Molecular Networks – Representation and Analysis

molecular

network

network

molecular

Molecular networks, their representation and analysis have attracted increasing interest in recent years. Although the importance of molecular networks has been recognized for a long time, only the advent of new technologies during the last two decades has delivered the necessary data for a systematic study of molecular networks and their complex behavior. Especially the surge of genome-wide data as well as the increase in computational power have contributed to establishing network and systems biology as new paradigms. The conceptual framework is generally based on an integrated

systems biology

approach of computational and experimental methods. In this chapter, we introduce basic concepts and outline mathematical formalisms for representing and analyzing molecular networks. In particular, we review the study of transcriptional regulatory networks in prokaryotes and of protein interaction networks in humans as prime examples of network-orientated approaches to complex systems. The chapter

network

biology

is concluded with a discussion of current challenges and future directions of network biology.

Miguel A. Hernandez-Prieto, Ravi K.R. Kalathur, Matthias E. Futschik

25. Whole-Exome Sequencing Data – Identifying Somatic Mutations

sequencing

instrument

The use of next-generation sequencing instruments to study hematological malignancies generates a tremendous amount of sequencing data. This leads to a challenging bioinformatics problem to store, manage, and analyze terabytes of sequencing data, often generated from extremely different data sources. Our project is mainly focused on sequence analysis of human cancer genomes, in order to identify the genetic lesions underlying the development of tumors. However, the automated detection

somatic mutation

procedure of somatic mutations and the statistical testing procedure to identify genetic lesions are still an open problem. Therefore, we propose a computational procedure to handle large-scale

large-scale sequencing data

exonic somatic mutation

sequencing data in order to detect exonic somatic mutations in a tumor sample. The proposed pipeline includes several steps based on open-source software and the R language: alignment, detection of mutations, annotation, functional classification, and visualization of results. We analyzed Illumina whole-exome sequencing data from five leukemic patients and five paired controls plus one colon cancer sample

Sanger sequencing

leukemia

-algorithm-hematological malignancies

and paired control. The results were validated by Sanger sequencing.

Roberta Spinelli, Rocco Piazza, Alessandra Pirola, Simona Valletta, Roberta Rostagno, Angela Mogavero, Manuela Marega, Hima Raman, Carlo Gambacorti-Passerini

Bioinformatics Databases and Ontologies

Frontmatter

26. Biological Databases

Biological databases constitute the data layer of molecular biology and bioinformatics and are

biological database

sequence

becoming a central component of some emerging fields such as clinical bioinformatics, and translational and personalized medicine. The building of biological databases has been conducted either considering the different representations of molecular entities, such as sequences and structures, or more recently by taking into account high-throughput platforms used to investigate cells and organisms, such as microarray and mass spectrometry technologies. This chapter provides an overview of the main biological databases currently available and underlines open problems and future trends.

This chapter reports on examples of existing biological databases with information about their use and application for the life sciences. We cover examples in the areas of sequence, interactomics, and proteomics databases. In particular, Sect.

26.1

biological database

interactomics

discusses sequence databases, Sect.

26.2

presents structure databases including protein contact maps, Sect.

26.3

introduces

structure

database

database

structure

a novel class of databases representing the interactions among proteins, Sect.

26.4

describes proteomics databases, an area of biological databases that is being continuously enriched by proteomics experiments, and finally Sect.

26.5

concludes the chapter by underlining future developments and the evolution of biological databases.

Mario Cannataro, Pietro H. Guzzi, Giuseppe Tradigo, Pierangelo Veltri

27. Ontologies for Bioinformatics

This chapter provides an introduction to ontologies and their application in bioinformatics.

It presents and overview of the range of information artifacts that are denoted as ontologies in this field, from controlled vocabularies to rich axiomatizations. It then focuses on the conceptual nature of ontologies and introduces the role of upper ontologies in the conceptualization process.

Language and technologies that underpin the definition and usage of ontologies are then presented, with a particular focus on the ones derived from the semantic web framework. One objective of this chapter is to provide a concise and effective understanding of how technologies and concepts such as ontologies, RDF, OWL, SKOS, reasoning and Linked-Data relate to each other. The chapter is then complemented by a bioinformatics section (Sect.

27.4

), both via an overview of the evolution of ontologies in this discipline, and via a more detailed presentation of a few notable examples such as gene ontologies (and the OBO family), BioPAX and pathway ontologies and UMLS. Finally, the chapter presents examples of a few areas where ontologies have found a significant usage in bioinformatics: data integration, information retrieval and data analysis (Sect.

27.5

). This last section briefly lists some tools exploiting the information contained in biomedical ontologies when paired with the output of high-throughput experiments such as cDNA microarrays.

Andrea Splendiani, Michele Donato, Sorin Drăghici

Bioinformatics in Medicine, Health and Ecology

Frontmatter

28. Statistical Signal Processing for Cancer Stem Cell Formation

Many mysteries related to the behavior of cancer stem cells (CSCs), their role in the formation of tumors, and the evolution of tumors with time remain unresolved. Biologists conduct experiments and collect data from them; these are then used for modeling, inference, and prediction of various unknowns that define the CSC system and are vital for its understanding. The aim of this chapter is to provide a summary of our progress in statistical signal processing models and methods for the advancement of the theory and understanding of cancer and the CSC paradigm. The chapter comprises three parts: model building, methods for the forward problem, and methods for the inverse problem.

Monica F. Bugallo, Petar M. Djurić

29. Epigenetics

In the past, the term

epigenetics

was used to describe all biological phenomena that do not follow normal genetic rules. Currently, it is generally accepted that epigenetics

epigenetics

refers to the heritable modifications of the genome that do not involve changes in the primary DNA

epigenetic

modification

sequence. Important epigenetic events include DNA methylation,

DNA

methylation

covalent post-transcriptional histone modifications, RNA-mediated silencing, and nucleosome remodeling. These epigenetic inheritances take place in the chromatin-mediated control of gene expression and are responsible for chromatin structure stability, genome integrity, modulation of the expression of tissue-specific genes, and embryonic development, which are essential mechanisms allowing the stable propagation of gene activity states from one generation of cells to the next. Importantly, during the past years, epigenetic

events have emerged to be considered as key mechanisms in the regulation of critical biological processes and in the development of human diseases. From this point of view, the importance of epigenetic events in the control of both normal cellular processes

cellular

process

cancer

research

and altered events associated with diseases has led to epigenetics being considered as a new frontier in cancer research.

Micaela Montanari, Marcella Macaluso, Antonio Giordano

30. Dynamics of Autoimmune Diseases

autoimmune disease

dynamics

Autoimmune diseases are due to the immune response of the human body, which reacts against substances or tissues of the body. Lupus is a systemic autoimmune disease, or autoimmune connective tissue disease, affecting any part of the human body. In this chapter we study the dynamics of autoimmune diseases using a control systems approach. We investigate how the drug scheduling framework previously developed by the authors can control the autoimmune phenomenon. The main purpose of this work is to demonstrate how available tools are capable of treating the autoimmune disease. We employ drug therapy as a control input and explore how it can contribute to treat autoimmune diseases. In particular, we study a model describing an autoimmune disease with a control input given by a lupus treatment drug, belimumab. We conduct additional modeling work since the models in the literature do not capture the explicit relation between autoimmunity and the drug therapy by belimumab. We also examine which part of the model can be controlled via manipulation of drug dosage and derive a control method with which to treat autoimmune inflammation related to autoreactive B cells.

In Sect.

30.1

we give a brief introduction to lupus, because we study a model in which the control input is a newly developed and approved lupus treatment drug. In Sect.

30.2

we recall the model in [

30.1

] and conduct additional modeling work, since the current models do not capture the explicit relation between drug therapy and autoimmunity. We also examine which part of the model could be used as a control input and derive a control method. Section

30.3

describes the proposed control ideas and develops the control procedure for the model, which can treat autoimmune inflammation by means of controlling autoreactive B cells. Finally, we discuss future work and present further remarks in Sect.

30.4

.

Hyeygjeon Chang, Alessandro Astolfi

31. Nutrigenomics

The entire complex of molecular processes

molecular

processes

of the human organism results from endogenous physiological execution of the information encoded in the genome but is also influenced by exogenous factors, which include those originating from nutrition as major agents. The assimilation of nutrient molecules within the human body continuously allows homeostatic reconstitution of its qualitative and quantitative composition but also takes part in physiological changes of body growth and adaptation to particular situations. Nevertheless, in addition to replacing material and energetic losses, nutritional intake also provides bioactive molecules,

bioactive

molecule

which are selectively able to modulate specific metabolic pathways, noticeably affecting the risk of cardiovascular and neoplastic diseases, which are the major cause of mortality in developed countries. Numerous bioactive nutrients are being progressively identified and their chemopreventive effects are being described at clinical and molecular mechanism levels. All

omics

technologies

omics

technologies

(such as transcriptomics, proteomics, and metabolomics) allow systematic analyses to study the effect of dietary bioactive molecules on the totality of molecular processes.

Since each nutrient

nutrient

might also have specific effects on individually different genomes, nutrigenomic and nutrigenetic analysis data can be distinguished by two different observational views: 1) the effects of the whole diet and of specific nutrients on genes, proteins, metabolic pathways, and metabolites; and 2) the effects of specific individual genomes on the biological activity of nutritional intake and of specific nutrients. Nutrigenomic knowledge of physiologic status and disease risk will provide the development of better

disease

risk

risk

disease

diagnostic procedures as well as new therapeutic strategies specifically targeted to nutritionally relevant processes.

Hylde Zirpoli, Mariella Caputo, Mario F. Tecce

32. Bioinformatics and Nanotechnologies: Nanomedicine

In this chapter we focus on the bioinformatics strategies

bioinformatics (BI)

strategies

for translating genome-wide expression analyses

expression

analysis

into clinically useful cancer markers

cancer

marker

marker

cancer

with a specific focus on breast cancer

breast cancer

cancer

breast

with a perspective on new diagnostic device tools coming from the field of nanobiotechnology

nano

biotechnology

and the challenges related to high-throughput data integration,

data

integration

analysis, and assessment from multiple sources.

Great progress in the development of molecular biology techniques has been seen since the discovery of the structure of deoxyribonucleic acid (DNA)

deoxyribonucleic acid (DNA)

and the implementation of a polymerase chain reaction (PCR)

polymerase chain reaction (PCR)

method. This started a new era of research on the structure of nucleic acids molecules, the development of new analytical tools, and DNA-based analyses that allowed the sequencing of the human genome, the completion of which has led to intensified efforts toward comprehensive analysis of mammalian cell struc ture and metabolism in order to better understand the mechanisms that regulate normal cell behavior and identify the gene alterations responsible for a broad spectrum of human diseases, such as cancer, diabetes, cardiovascular diseases, neurodegenerative disorders, and others.

Federico Ambrogi, Danila Coradini, Niccolò Bassani, Patrizia Boracchi, Elia M. Biganzoli

33. Personalized Information Modeling for Personalized Medicine

Personalized modeling

personalized modeling

offers a new and effective approach for the study of pattern recognition and knowledge discovery, especially for biomedical applications. The created models are very useful and informative for analyzing and evaluating an individual data object for a given problem. Such models are also expected to achieve a higher degree of accuracy of prediction of outcome or classification than conventional systems and methodologies. Motivated by the concept of personalized medicine and utilizing transductive reasoning,

transductive

reasoning

personalized modeling was recently proposed as a new method for knowledge discovery

knowledge

discovery

in biomedical applications. Personalized modeling aims to create a unique computational diagnostic or prognostic model for an individual. Here we introduce an integrated method for personalized modeling that applies global optimization of variables (features) and an appropriate neighborhood size to create an accurate personalized model for an individual. This method creates an integrated computational system that combines different information processing techniques, applied at different stages of data analysis, e.g., feature selection, classification, discovering the interaction of genes, outcome prediction, personalized profiling and visualization, etc. It allows for adaptation, monitoring, and improvement of an individualʼs model and leads to improved accuracy and unique personalized profiling that could be used for personalized treatment and personalized drug

personalized

drug design

design.

Yingjie Hu, Nikola Kasabov, Wen Liang

34. Health Informatics

Computers have been used in healthcare for many years for administrative, clinical, and research purposes. Health informatics is concerned with the use of data for the management of disease and the healthcare process. Increasingly health informatics is using data and approaches developed for bioinformatics and vice versa and there are many areas where computational intelligence has the potential to make a useful contribution to health informatics. Health informatics is both a practical profession and an area of research. This chapter deals with the organization of healthcare,

organization of healthcare

areas of development of health informatics

health informatics

in recent times, and some active areas of research that may be relevant.

David Parry

35. Ecological Informatics for the Prediction and Management of Invasive Species

Ecologists face rapidly accumulating environmental data

environmental data

data

environmental

form spatial studies and from large-scale field experiments

field experiments

such that many now specialize in information technology. Those scientists carry out interdisciplinary research in what is known as ecological informatics.

ecological

informatics

Ecological informatics is defined as a discipline that brings together ecology and computer science to solve problems using biologically-inspired computation,

biologically-inspired computation

computation

biologically-inspired

information processing,

information

processing

and other computer science disciplines such as data management and visualization. Scientists working in the discipline have research interests that include ecological knowledge discovery, clustering, and forecasting, and simulation of ecological dynamics by individual-based or agent-based models, as well as

model

hybrid models and artificial life. In this chapter, ecological informatics techniques are applied to answer questions about alien invasive species, in particular, species that pose a biosecurity threat in a terrestrial ecological setting. Biosecurity is defined as the protection of a regionʼs environment, flora and fauna, marine life, indigenous resources, and human and animal health. Because biological organisms can cause billions of dollars of impact in any country, good science, systems, and protocols that underpin a regulatory biosecurity

biosecurity

system are required in order to facilitate international trade. The tools and techniques discussed in this chapter are designed to be used in a risk analysis

risk

analysis

procedure so that agencies in charge of biosecurity can prioritize scarce resources and effort and be better prepared to prevent unexpected incursions of dangerous invasive species. The methods are used to predict, (1) which species out of the many thousands

prediction

might establish in a new area, (2) where those species might establish, and, (3) where they might spread over a realistic landscape so that their impact can be determined.

Susan P. Worner, Muriel Gevrey, Takayoshi Ikeda, Gwenaël Leday, Joel Pitt, Stefan Schliebs, Snjezana Soltic

Understanding Information Processes in the Brain and the Nervous System

Frontmatter

36. Information Processing in Synapses

The synapse

information

processing in synapses

is a basic functional structure for information processing between neurons in the central nervous system, required for understanding of the functional properties of neural circuits and brain functions, and even the consciousness that emerges from them. There is now a wealth of experimental results concerning the detailed structure and functional properties

functional

synapses properties

of synapses on the basis of molecular biological, since the series of pioneering works by Bert Sakmann and Shosaku Numa in the 1980s that had been originally suggested and postulated by Bernard Katz and colleagues. With the introduction of more advanced research techniques, such as the patch-clamp method

patch-clamp

method

(in electrophysiology), two-photon confocal laser microscopy

two-photon confocal laser microscopy

(in imaging), and molecular biological methods, into the research field of synaptic physiology,

synaptic

physiology

understanding of the functional significance of synapses has advanced enormously at the molecular level, with fruitful results. Furthermore, emerging new techniques with the invention of noninvasive whole-brain imaging methods

whole-brain imaging

brain

imaging methods

(functional magnetic resonance imaging (fMRI), etc.) make it necessary for researchers to have deep understanding of the relationship between the microscopic physiological phenomena and the higher brain functions composed of and elicited from neural networks. Quantitative expressions of electrical and chemical signal transactions carried out within neural networks allow investigators in other fields such as engineering, computer science, and applied physics to treat these biological mechanisms mathematically for computational neuroscience.

computational

neuroscience

neuroscience

computational

In this chapter, the physiology, biophysics, and pharmacology of the information processes of synaptic transmission in the central nervous system are presented to provide the necessary background knowledge to researchers in these fields. Especially, electrophysiological results regarding receptors and ion channels, playing important roles in synaptic transmission, are

electrophysiology

presented from the viewpoint of biophysics. Moreover, one of the most advanced techniques, namely fast multiple-point photolysis

fast multiple-point photolysis

photolysis

fast multiple-point

by two-photon laser beam activation of receptors in the membrane of a single neuron and neural tissues on the submicron level, is introduced together with results obtained by authorsʼ experiments.

Hiroshi Kojima

37. Computational Modeling with Spiking Neural Networks

This chapter reviews recent developments in the area of spiking neural networks (SNN) and summarizes the main

spiking neural network (SNN)

contributions to this research field. We give background information about the functioning of biological neurons, discuss the most important mathematical neural models along with neural encoding techniques, learning algorithms, and applications of spiking neurons. As a specific application, the functioning of the evolving spiking neural network (eSNN) classification method is presented in detail and the principles of numerous eSNN based applications are highlighted and discussed.

Stefan Schliebs, Nikola Kasabov

38. Statistical Methods for fMRI Activation and Effective Connectivity Studies

Functional magnetic resonance imaging (fMRI)

fMRI

activation detection

functional

magnetic resonance imaging (fMRI)

is a technique to indirectly measure activity in the brain through the flow of blood. fMRI has been a powerful tool in helping us gain a better understanding of the human brain since it appeared over 20 years ago. However, fMRI poses many challenges for engineers. In particular, to detect and interpret the blood oxygen level-dependent (BOLD) signals on which fMRI is based is a challenge; For example, fMRI activation may be caused by a local neural population (activation detection)

activation

detection

or by a distant brain region (effective connectivity).

effective connectivity

Although many advanced statistical methods have been developed for fMRI data analysis, many problems are still being addressed to maximize the accuracy in activation detection and effective connectivity analysis. This chapter presents general statistical methods for activation detection

activation

detection

and effective connectivity

effective connectivity

analysis in fMRI scans of the human brain.

fMRI

A linear regression

regression

model for activation detection is introduced (Sect.

38.2.1

), and a detailed statistical inference method for activation detection (Sect.

38.2.2

) when applying an autoregression

autoregression

model to correct residual terms of the linear model

linear

model

model

linear

is presented in Sect.

38.2.3

. We adopt a two-stage mixed model

two-stage model

mixed model

to combine different subjects (Sect.

38.3

) for second-level data analysis. To estimate the variance for the mixed model, a modified expectation-maximization algorithm is employed. Finally, due to the false positives in the activation map, a Bonferroni-related threshold correction

threshold correction

method is developed to control the false positives (Sect.

38.3.3

). An fMRI dataset from retinotopic mapping

mapping

experiments was employed to test the feasibility of the methods for both first- and second-level analysis.

In Sect.

38.4

, we present a nonlinear system identification method (NSIM)

nonlinear system

identification

for effective connectivity

effective connectivity

study. The mathematical theory of the method is presented in Sect.

38.4.1

. An

F

statistical test is proposed to quantify the magnitude of the relationship based on two-connection and three-connection visual networks (Sect.

38.4.3

). To circumvent the limitation of the model overfitting in NSIM, a model selection

model

selection

algorithm is suggested subsequently. In the model selection, we propose a nongreedy search method, e.g., least angle regression

least angle regression

regression

least angle

for effective connectivity analysis (Sects.

38.4.7

and

38.4.8

). Three datasets obtained from standard block and random block designs are used to verify the method, and we outline some research directions and conclude our study in Sect.

38.5

.

Xingfeng Li, Damien Coyle, Liam Maguire, T. Martin McGinnity

39. Neural Circuit Models and Neuropathological Oscillations

Degeneration of cognitive functioning due to dementia is among the most important health problems in the ageing population and society today. Alzheimerʼs disease (AD) is the most common cause of dementia, affecting more than 5 Mio. people in Europe with the global prevalence of AD predicted to quadruple to 106 Mio. by 2050. This chapter is focused on demonstrating models of neural circuitry and brain structures affected during neurodegeneration as a result of AD and how these model can be employed to better understand how changes in the physical basis in the electrochemical interactions at neuron/synapse level are revealed at the neural population level. The models are verified using known and observed neuropathalogical oscillations in AD. The thalamus plays a major role in generating many rhythmic brain oscillations yet, little is known about the role of the thalamus in neurodegeneration and whether or not thalamus atrophy is a primary or secondary phenomenon to hippocampal or neo cortical loss in AD. Neural mass models of thalamocortical networks are presented to investigate the role these networks have in the alterations of brain oscillation observed in AD. Whilst neural mass models offer many insights into thalamocortcial circuitry and rhythm generation in the brain, they are not suitable for elucidating changes synaptic processes and individual synaptic loss at the microscopic scale. There is significant evidence that AD is a synaptic disease. A model consisting of multiple Izhikevich type neurons elucidates now large scale networks of simple neurons can shed light on the relationship between synaptic/neuron degradation/loss and neural network oscillations. Focusing on thalamocortical circuitry may help explain oscillatory changes however the progression of AD is also usually associated with memory deficits, this implicates other brain structure such as the hippocampus. A hippocampal computational model that allows investigation of how the hippocampo-septal theta rhythms can bebe altered by beta-amyloid peptide (A

β

, a main marker of AD) is also described. In summary the chapter presents three different computational models of neural circuitry at different scales/brain regions and demonstrates how these models can be used to elucidate some of the vacuities in our knowledge of brain oscillations and how the symptoms associated with AD are manifested from the electrochemical interactions in neurobiology and neural populations.

Damien Coyle, Basabdatta S. Bhattacharya, Xin Zou, KongFatt Wong-Lin, Kamal Abuhassan, Liam Maguire

40. Understanding the Brain via fMRI Classification

In this chapter, we present investigations on magnetic resonance imaging (MRI) of various states of brain by extracting the most significant features in order to classify brain images into normal and abnormal. We describe a novel method based on the wavelet transform to initially decompose the images, followed by the use of various feature selection algorithms to extract the most significant brain features from the MRI images. This chapter demonstrates the use of different classifiers to detect abnormal brain images from a publicly available neuroimaging dataset. A wavelet-based feature extraction followed by selection of the most significant features using principal component analysis (PCA)/quadratic discriminant analysis (QDA) with classification using learning-based classifiers results in a significant improvement in accuracy as compared with previously reported studies and to better understanding of brain abnormalities.

Lavneet Singh, Girija Chetty

Advanced Signal Processing Methods for Brain Signal Analysis and Modeling

Frontmatter

41. Nonlinear Adaptive Filtering in Kernel Spaces

Recently, a family of online kernel-learning algorithms, known as the kernel adaptive filtering (KAF) algorithms,

kernel adaptive filtering (KAF)

has become an emerging area of research. The KAF algorithms are developed in reproducing kernel Hilbert spaces (RKHS), by using the linear structure of this space to implement well-established linear adaptive algorithms and to obtain nonlinear filters in the original input space. These algorithms include the kernel least mean squares (KLMS), kernel affine projection algorithms (KAPA), kernel recursive least squares (KRLS), and extended kernel recursive least squares (EX-KRLS), etc. When the kernels are radial (such as the Gaussian kernel), they naturally build a growing RBF network, where the weights are directly related to the errors in each sample. The aim of this chapter is to give a brief introduction to kernel adaptive filters. In particular, our focus is on KLMS, the simplest KAF algorithm, which is easy to implement, yet efficient. Several key aspects of the algorithm are discussed, such as self-regularization, sparsification, quantization, and the mean-square convergence. Application examples are also presented, including in particular the adaptive neural decoder for spike trains.

Badong Chen, Lin Li, Weifeng Liu, José C. Príncipe

42. Recurrence Plots and the Analysis of Multiple Spike Trains

Spike trains are difficult to analyze and compare because they are point processes, for which relatively few methods of time series analysis exist. Recently, several distance measures between pairs of spike train windows (segments) have been proposed. Such distance measures allow one to draw recurrence plots, two-dimensional graphs for visualizing dynamical changes of time series data, which in turn allows investigation of many spike train properties, such as serial dependence, chaos, and synchronization. Here, we review some definitions of distances between windows of spike trains, explain methods developed on recurrence plots, and illustrate how these plots reveal spike train properties by analysis of simulated and experimental data.

Yoshito Hirata, Eric J. Lang, Kazuyuki Aihara

43. Adaptive Multiscale Time-Frequency Analysis

Time-frequency analysis techniques are now adopted as standard in many applied fields, such as bio-informatics and bioengineering, to reveal frequency-specific and time-locked event-related information of input data. Most standard time-frequency techniques, however, adopt

fixed

basis functions to represent the input data and are thus suboptimal. To this cause, an empirical mode decomposition (EMD) algorithm has shown considerable prowess in the analysis of nonstationary data as it offers a fully data-driven approach to signal processing. Recent multivariate extensions of the EMD algorithm, aimed at extending the framework for signals containing multiple channels, are even more pertinent in many real world scenarios where multichannel signals are commonly obtained, e.g., electroencephalogram (EEG) recordings. In this chapter, the multivariate extensions of EMD are reviewed and it is shown how these extensions can be used to alleviate the long-standing problems associated with the standard (univariate) EMD algorithm. The ability of the multivariate extensions of EMD as a powerful real world data analysis tool is demonstrated via simulations on

empirical mode decomposition (EMD)

time

-frequency analysis

multivariate

data analysis

biomedical

signal

biomedical signals.

Naveed ur Rehman, David Looney, Cheolsoo Park, Danilo P. Mandic

Information Modeling of Perception, Sensation and Cognition

Frontmatter

44. Modeling Vision with the Neocognitron

The

neocognitron

, which was proposed by

Fukushima

[

44.1

], is a neural network model capable of robust

neocognitron

visual pattern recognition. It acquires the ability to recognize patterns through learning.

The neocognitron is a hierarchical network consisting of many layers of neuron-like cells. Its architecture was originally suggested from neurophysiological findings on visual systems of mammals. There are bottom-up connections between cells in adjoining layers. Some of these connections are variable and can be modified by learning. The neocognitron can acquire the ability to recognize patterns by learning. Since it has a large power of generalization, presentation of only a few typical examples of deformed patterns (or features) is enough for learning. It is not necessary to present all of the deformed versions of the patterns that might appear in the future. After learning, the neocognitron can recognize input patterns robustly, with little effect from deformation, changes in size, or shifts in location. It is even able to correctly recognize a pattern that has not been presented before, provided that it resembles one of the training patterns.

The principle of the neocognitron can be used in various kinds of pattern recognition systems, such as recognizing handwritten characters. Further extensions and modifications of the neocognitron have been proposed to endow it with a function of selective attention, an ability to recognize partly occluded patterns, and so on.

Kunihiko Fukushima

45. Information Processing in the Gustatory System

Gustation is a sensory modality that is essential for survival. Information provided by the gustatory system enables identification and rejection of poisons and encourages ingestion of nutritious foods. In addition, fine taste discrimination, guided by postingestional feedback, is critical for the maintenance of energy homeostasis. The study of how gustatory information is processed by both the peripheral and the central nervous systems has been an active and fruitful area of investigation and has yielded some dramatic results in recent years. In this chapter, we will first discuss the general concept of neural coding of sensory information to provide a backdrop for the discussion of gustatory neural coding in particular. Next, the anatomy and organization of each level of the gustatory system from the tongue to the cortex will be presented in the context of various theories of sensory coding as they have been applied to gustation. Finally, we will reviewthe unifying ideas that characterize information processing in the taste system.

Alexandra E. DʼAgostino, Patricia M. Di Lorenzo

46. EEG Signal Processing for Brain–Computer Interfaces

This

electroencephalogram (EEG)

signal processing

chapter is focused on recent advances in electroencephalogram (EEG) signal processing for brain computer interface (BCI) design. A general overview of BCI technologies is first presented, and then the protocol for

brain–computer interface (BCI)

motor imagery noninvasive BCI for mobile robot control is discussed. Our ongoing research on noninvasive BCI design based not on recorded EEG but on the brain sources that originated the EEG signal is also introduced. We propose a solution to EEG-based brain source recovering by combining two techniques, a sequential Monte Carlo method for source localization and spatial filtering by beamforming for the respective source signal estimation. The EEG inverse problem is previously studded assuming that the source localization is known. In this work for the first time the problem of inverse modeling is solved simultaneously with the problem of the respective source space localization.

Petia Georgieva, Filipe Silva, Mariofanna Milanova, Nikola Kasabov

47. Brain-like Information Processing for Spatio-Temporal Pattern Recognition

Information processes in the brain, such as gene and protein expression, learning, memory, perception, cognition, consciousness are all spatio- and/or spectro temporal. Modelling such processes would require sophisticated information science methods and the best ones could be the brain-inspired ones, that use the same brain information processing principles. Spatio and spectro-temporal data (SSTD) are also the most common types of data collected in many domain areas, including engineering, bioinformatics, neuroinformatics, ecology, environment, medicine, economics, etc. However, there is lack of methods for the efficient analysis of such data and for spatio-temporal pattern recognition (STPR). The brain functions as a spatio-temporal information processing machine and deals extremely well with spatio-temporal data. Its organization and functions have been the inspiration for the development of new methods for SSTD analysis and STPR. Brain-inspired spiking neural networks (SNN) are considered the third generation of neural networks and are a promising paradigm for the creation of new intelligent ICT for SSTD. This new generation of computational models and systems is potentially capable of modeling complex information processes due to the ability to represent and integrate different information dimensions, such as time, space, frequency, and phase, and to deal with large volumes of data in an adaptive and self-organizing manner. This chapter reviews methods and systems of SNN for SSTD analysis and STPR, including single neuronal models, evolving spiking neural networks (eSNN), and computational neurogenetic

spiking neural network (SNN)

models (CNGM). Software and hardware implementations and some pilot applications for audio-visual pattern recognition, EEG data-analysis, cognitive robotic systems, BCI, neurodegenerative diseases, and others are discussed.

Nikola Kasabov

48. Neurocomputational Models of Natural Language

In this chapter I review computational models of the neural circuitry which implements the human capacity for language. These models cover many different aspects of the language capacity, from representations of phonology and word forms to representations of sentence meanings and syntactic structures. The computational models discussed are neural networks: structures of simple units which are densely interconnected and can be active in parallel. I review the computational properties of the networks introduced and also empirical evidence from different sources (neural imaging, behavioral experiments, patterns of impairment following brain dysfunction) which supports the models described.

Alistair Knott

Neuroinformatics Databases and Ontologies

Frontmatter

49. Ontologies and Machine Learning Systems

In this chapter we review the uses of ontologies within bioinformatics and neuroinformatics and the various attempts to combine machine learning (ML) and ontologies, and

ontology

the uses of data mining ontologies. This is a diverse field and there is enormous potential for wider use of ontologies in bioinformatics and neuroinformatics research and system development. A systems biology approach comprising of experimental and computational research using biological, medical, and clinical data is needed to understand complex biological processes and help scientists draw meaningful inferences and to answer questions scientists have not even attempted so far.

Shoba Tegginmath, Russel Pears, Nikola Kasabov

50. Integration of Large-Scale Neuroinformatics – The INCF

Understanding the human brain and its function in

INCF (International Neuroinformatics Coordinating Facility)

health and disease represents one of the greatest scientific challenges of our time. In the post-genomic era, an overwhelming accumulation of new data, at all levels of exploration from DNA to human brain imaging, has been acquired. This accumulation of facts has not given rise to a corresponding increase in the understanding of integrated functions in this vast area of research involving a large number of fields extending from genetics to psychology. Neuroinformatics is uniquely placed at the intersection

neuroinformatics (NI)

between neuroscience and information technology, and emerges as an area of critical importance to facilitate the future conceptual development in neuroscience by creating databases which transcend different organizational

database

levels and allow for the development of different computational models from the sub-cellular to the global brain level.

Raphael Ritz, Shiro Usui

Information Modeling for Understanding and Curing Brain Diseases

Frontmatter

51. Alzheimerʼs Disease

Alzheimerʼ s disease (AD) is the most common neurodegenerative disorder in late life which is clinically characterized by dementia and progressive cognitive impairments with presently no effective treatment. This chapter summarizes recent progress achieved during the last decades in understanding the pathogenesis of AD. Basing on the pathomorphological hallmarks (senile amyloid plaque deposits, occurance of neurofibrillary tangles as hyperphosphorylated tau protein in cerebral cortex and hippocampus) and other consistent features of the disease (neurodegeneration, cholinergic dysfunction, vascular impairments), the mayor hypotheses of cause and development of the sporadic, not genetically inherited, AD are described. Finally, to reflect the disease in its entirety and internal connective relationships, the different pathogenetic hypotheses are tentatively combined to describe the interplay of the essential features of AD and their mutually influencing pathogenetic processes in a unified model. Such a unified approach may provide a basis to model pathogenesis and progression of AD by application of computational methods such as the recently introduced novel research framework for building probabilistic computational neurogenetic models (pCNGM) by

Kasabov

and coworkers [

51.1

].

Reinhard Schliebs

52. Integrating Data for Modeling Biological Complexity

This chapter describes how information relating to the interactions between molecules in complex biological pathways can be identified from the scientific literature and integrated into maps of functional relationships. The molecular biology of the amyloid precursor protein (APP) in the synaptic processes involved in normal cognitive function and

cognitive

function

neurodegenerative disease is used as

neurodegenerative disease

amyloid precursor protein (APP)

a case study. The maps produced are interpreted with reference to basic concepts of biological regulation and control. Direct and indirect feedback relationships between the amyloid precursor protein, its proteolytic fragments and various processes that contribute to processes involved in synaptic modifications are identified. The contributions of the amyloid precursor protein and its proteolytic fragments are investigated with reference to disease pathways in Alzheimer disease and new perspectives on disease progression are highlighted. Mapping functional relationships in complex biological pathways is useful to summarize the current knowledge base, identify further targets for research, and for empirical experimental design and interpretation of results.

Sally Hunter, Carol Brayne

53. A Machine Learning Pipeline for Identification of Discriminant Pathways

Identifying the molecular pathways more prone to disruption during a pathological process is a key task in network medicine and, more generally, in systems biology. This chapter describes a pipeline that couples a machine learning solution for molecular profiling with a recent network comparison method. The pipeline can identify changes occurring between specific sub-modules of

network

comparison

biomarker

networks built in a case-control biomarker study, discriminating key groups of genes whose interactions are modified by an underlying condition. Different algorithms can be chosen to implement the workflow steps. Three applications on genome-wide data are presented regarding the susceptibility of children to air pollution, and early and late onset of Parkinsonʼs and Alzheimerʼs diseases.

Annalisa Barla, Giuseppe Jurman, Roberto Visintainer, Margherita Squillario, Michele Filosi, Samantha Riccadonna, Cesare Furlanello

54. Computational Neurogenetic Modeling: Gene-Dependent Dynamics of Cortex and Idiopathic Epilepsy

The chapter describes a novel computational approach to modeling the cortex dynamics that integrates gene–protein regulatory networks with a neural network model. Interaction of genes and proteins in neurons affects the dynamics of the whole neural network. We have adopted an exploratory approach of investigating many randomly generated gene regulatory matrices out of which we kept those that generated interesting dynamics. This naïve brute force approach served us to explore the potential application of computational neurogenetic models in relation to gene knock-out

neurogenetics

experiments. The knock out of a hypothetical gene for fast inhibition in our artificial genome has led to an interesting neural activity. In spite of the fact that the artificial gene/protein network has been altered due to one gene knock out, the dynamics

computational

neurogenetic modeling dynamics

of SNN in terms of spiking activity was most of the time very similar to the result obtained with the complete gene/protein network. However, from time to time the neurons spontaneously temporarily synchronized their spiking into coherent global oscillations. In our model, the fluctuations in the values of neuronal parameters leads to spontaneous development of seizure-like global synchronizations.

seizure-like

These very same fluctuations also lead to termination of the seizure-like neural activity and maintenance of the inter-ictal normal periods of activity. Based on our model, we would like to suggest a hypothesis that parameter changes due to the gene–protein dynamics should also be included as a serious factor determining transitions in neural dynamics, especially when the cause of disease is known to be genetic.

Lubica Benuskova, Nikola Kasabov

55. Information Methods for Predicting Risk and Outcome of Stroke

Stroke is a major cause of disability and mortality in most economically developed countries. It is the second leading cause of death worldwide (after cancer and heart disease) [

55.1

,

2

] and a major cause of disability in adults in developed countries [

55.3

]. Personalized modeling is an emerging effective computational approach, which has been applied to various disciplines, such as in personalized drug design, ecology, business, and crime prevention; it has recently become more prominent in biomedical applications. Biomedical data on stroke risk factors and prognostic data are available in a large volume, but the data are complex and often difficult to apply to a specific person. Individualizing stroke risk prediction and prognosis will allow patients to focus on risk factors specific to them, thereby reducing their stroke risk and managing stroke outcomes more effectively. This chapter reviews various methods–conventional statistical methods and computational intelligent modeling methods for predicting risk and outcome of stroke.

Wen Liang, Rita Krishnamurthi, Nikola Kasabov, Valery Feigin

56. sEMG Analysis for Recognition of Rehabilitation Actions

Surface electromyography (sEMG), a measurement of biomedical electronic signals from the muscle surface using electrodes, shows the motor status of the nerve–muscle system and motor instruction information. Active motion intention and the motor status of impaired stroke patients can be acquired by sEMG. Currently, sEMG is widely used in prosthetic arm control, rehabilitation robot control, exoskeletal power assist robot control, tele-operated robots, virtual reality, and so on. The application of sEMG to a rehabilitation robot is studied in this chapter. sEMG is used to build an information channel between the patient and the robot to obtain biological feedback control during rehabilitation training. It is of great significance for the improvement of patientsʼ consciousness of active participation. It will also help to improve the evaluation and efficiency of rehabilitation. It establishes a general scheme for other applications related to the human–machine interface.

First, the generation mechanism and characteristic of the sEMG signal are presented in this chapter. Next, current feature analysis methods of sEMG are summarized. The advantages and disadvantages of each method are discussed. Then, an example of sEMG signal analysis for the recognition of rehabilitation actions is given. Finally, future work and discussions are given in the conclusion.

Qingling Li, Zeng-Guang Hou, Yu Song

Nature Inspired Integrated Information Technologies

Frontmatter

57. Brain-Like Robotics

This chapter aims to provide an overview of what is happening in the field of brain like robotics, what the main issues are and

brain

how they are being addressed by different authors. It starts by introducing several concepts and theories on the evolution and operation of the brain and provides a basic biological and operational framework as background to contextualize the topic. Building on these foundations, the main body of the chapter is devoted to the different contributions within the robotics community that use brain-like models as a source of inspiration for controlling real robots. These contributions are addressed from two perspectives. On one hand the main cognitive architectures developed under a more or less strict brain-like point of view are presented, offering a brief description of each architecture as well as highlighting some of their main contributions. Then the point of view is changed and a more extensive review is provided of what is being done within three areas that we consider key for the future development of autonomous brain-like robotic creatures that can live and work in human environments interacting with other robots and human beings. These are: Memory, Attention and Emotions. This review is followed by a description of some of the current projects that are being carried out or have recently finished within this field as well as of some robotic platforms that are currently being used. The chapter is heavily referenced in the hope that this extensive compilation of papers and books from the different areas that are relevant within the field are useful for the reader to really appreciate its breadth and beauty.

Richard J. Duro, Francisco Bellas, José A. Becerra Permuy

58. Developmental Learning for User Activities

This chapter presents a brain-inspired developmental learning system. A personal computer

lives

with the human user as long as the power is on. It can develop and report some activities of the user like a

shadow

machine, a virtual machine that runs in the background while the human user is doing its regular activities, on the computer or off the computer. The goal of the teacher of this

shadow machine

is to enable it to observe human usersʼ status, recognize usersʼ activities, and provide the taught actions as desired reports. Both visual and acoustic contexts are used by this

shadow machine

to infer the userʼs activities (e.g., in an office). A major challenge is that the system must be applicable to open domains – without a handcrafted environmental model. That is, there is no handcrafted constraint on office lighting, size, setting, nor requirements of the use of a head-mounted close-talk microphone. A room microphone sits somewhere near the computer. The distance between the sound sources and the microphone varies significantly. This system is designed to respond to its sensory inputs. A more challenging issue is to make the system adapt to different users and different environments. Instead of building all the world knowledge in advance (which is intractable), the systemʼs adaptive capability enables it to learn sensorimotor association (which is tractable). The real-time prototype system has been tested in different office environments.

Xiao Huang, Juyang Weng, Zhengyou Zhang

59. Quantum and Biocomputing – Common Notions and Targets

Biocomputing and quantum computing are both relatively novel areas of information processing sciences under the umbrella

natural computing

established in the late twentieth century. From the practical point of view one can say that in both bio and quantum paradigms, the purpose is to replace the traditional media of computing by an alternative. Biocomputing is based on an appropriate treatment of biomolecules, and quantum computing is based on the physical realization of computation on systems so small that they must be described by using quantum mechanics. The efficiency of the proposed biomolecular computing is based on massive parallelism, which is implementable by already existing technology for small instances. In a sense, also quantum computing involves parallelism. From time to time, there are proposals or attempts to create a uniform approach to both biocomputational and quantum parallelism. The main purpose of this article is the explain why this a very challenging task. For this aim, we present the usual mathematical formalism needed to speak about quantum computing and compare quantum parallelism to its biomolecular counterpart.

Mika Hirvensalo

60. Brain, Gene, and Quantum Inspired Computational Intelligence

This chapter discusses opportunities and challenges for the creation of methods of computational intelligence (CI)

brain-like

computational intelligence

and more specifically – artificial neural networks (ANN), inspired by principles at different levels of information processing in the brain: cognitive, neuronal, genetic, and quantum, and mainly, the issues related to the integration of these principles into more powerful and accurate CI methods. It is demonstrated how some of these methods can be applied to model biological processes and to improve our understanding in the subject area; generic CI methods being applicable to challenging generic AI problems. The chapter first offers a brief presentation of some principles of information processing at different levels of the brain and then presents brain inspired, gene inspired, and quantum inspired CI. The main contribution of the chapter, however, is the introduction of methods inspired by the integration of principles

integrative

computational intelligence

from several levels of information processing, namely:

1.

A computational neurogenetic model that in one model combines gene information related to spiking neuronal activities.

2.

A general framework of a quantum spiking neural network (SNN) model.

3.

A general framework of a quantum computational neurogenetic model (CNGM).

Many open questions and challenges are discussed, along with directions for further research.

Nikola Kasabov

61. The Brain and Creativity

Modern abstract art is considered complex and the extraction of meaning from some works of art is largely controversial.

abstract art

However, some artists have explicitly tried to produce paintings in accordance with specific goals. This means that behind their artwork there is a project realized through creativity. Their paintings clearly reflect those efforts and are able to show the

creativity

emergence of complex ideas reproducing a non-linear and uncertain world. This chapter investigates the link between brain-states of a subjectʼs perception of art with the complexity of the art. More than 25 paintings of famous artists of modern art

complexity

are studied and evaluated. The concept of artistic complexity, C

A

has been introduced as a metric for assessing the complexity of paintings of different artists. The results achieved have been compared to the saliency maps earlier introduced in computer vision as computational models of bottom-up VA. The measure proposed is based on an interplay between top-down and bottom-up approaches, manifesting the difficulty of the human brain in extracting invariants from some abstract representations. The intriguing relationships shown may offer a paradigm for testing novel computational models on brain-like machines. The methodologies described are likely to be of interest for multimedia quality assessment as metrics able to emulate the integral mechanisms of human visual systems as well as to correlate well with visual perception of quality.

Francesco C. Morabito, Giuseppe Morabito, Matteo Cacciola, Gianluigi Occhiuto

62. The Allen Brain Atlas

The Allen Brain Atlas is an online publicly available resource that integrates gene expression and connectivity data with neuroanatomical information for the mouse, human, and non-human primate. Launched in 2004 by the Allen Institute for Brain Science, the portal currently receives about 45000 unique users each month. More than one petabyte of in situ hybridization imagery and over 240 million microarray data points from six adult human brains representing 3700 tissue samples have been generated to date. As one of the most comprehensive gene expression resources for the nervous system, scientists regularly use these resources to study the expression profile of genes in the various regions of the brain. Additional usage includes searching for biomarkers, correlating gene expression to neuroanatomy, and other large-scale correlative data analysis. This chapter reviews the resources available and describes how they were constructed to enable development of visualization and search tools to analyze the massive amount of data generated. Finally, examples are provided on how these tools can be leveraged for scientific discovery.

Michael Hawrylycz, Lydia Ng, David Feng, Susan Sunkin, Aaron Szafer, Chinh Dang

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise