Medical Databases and Information Systems

Application of Three-Level Handprinted Documents Recognition in Medical Information Systems

In this paper the application of novel three-level recognition concept to processing of some structured documents (forms) in medical information systems is presented. The recognition process is decomposed into three levels: character recognition, word recognition and form contents recognition. On the word and form contents level the probabilistic lexicons are available. The decision on the word level is performed using results of character classification based on a character image analysis and probabilistic lexicon treated as a special kind of soft classifier. The novel approach to combining these both classifiers is proposed, where fusion procedure interleaves soft outcomes of both classifiers so as to obtain the best recognition quality. Similar approach is applied on the semantic level with combining soft outcomes of word classifier and probabilistic form lexicon. Proposed algorithms were experimentally applied in medical information system and results of automatic classification of laboratory test order forms obtained on the real data are described.

Jerzy Sas, Marek Kurzynski

Data Management and Visualization Issues in a Fully Digital Echocardiography Laboratory

This paper presents a PACS solution for echocardiography laboratories, denominated as

Himage

, that provides a cost-efficient digital archive, and enables the acquisition, storage, transmission and visualization of DICOM cardiovascular ultrasound sequences. The core of our approach is the implementation of a DICOM private transfer syntax designed to support any video encoder installed on the operating system. This structure provides great flexibility concerning the selection of an encoder that best suits the specifics of a particular imaging modality or working scenario. The major advantage of the proposed system stems from the high compression rate achieved by video encoding the ultrasound sequences at a proven diagnostic quality. This highly efficient encoding process ensures full online availability of the ultrasound studies and, at the same time, enables medical data transmission over low-bandwidth channels that are often encountered in long range telemedicine sessions. We herein propose an imaging solution that embeds a Web framework with a set of DICOM services for image visualization and manipulation, which, so far, have been traditionally restricted to intranet environments.

Carlos Costa, José Luís Oliveira, Augusto Silva, Vasco Gama Ribeiro, José Ribeiro

A Framework Based on Web Services and Grid Technologies for Medical Image Registration

Medical Imaging implies executing complex post-processing tasks such as segmentation, rendering or registration which requires resources that exceeds the capabilities of conventional systems. The usage of Grid Technologies can be an efficient solution, increasing the production time of shared resources. However, the difficulties on the use of Grid technologies have reduced its spreading outside of the scientific arena.

This article tackles the problem of using Grid Technologies for the co-registration of a series of volumetric medical images. The co-registration of time series of images is a needed pre-processing task when analysing the evolution of the diffusion of contrast agents. This processing requires large computational resources and cannot be tackled efficiently on an individual basis. This article proposes and implements a four-level software architecture that provides a simple interface to the user and deals transparently with the complexity of Grid environment. The four layers implemented are: Grid Layer (the closest to the Grid infrastructure), the Gate-to-Grid (which transforms the user requests to Grid operations), the Web Services layer (which provides a simple, standard and ubiquitous interface to the user) and the application layer.

An application has been developed on top of this architecture to manage the execution of multi-parametric groups of co-registration actions on a large set of medical images. The execution has been performed on the EGEE Grid infrastructure. The application is platform-independent and can be used from any computer without special requirements.

Ignacio Blanquer, Vicente Hernández, Ferran Mas, Damià Segrelles

Biomedical Image Processing Integration Through INBIOMED: A Web Services-Based Platform

New biomedical technologies need to be integrated for research on complex diseases. It is necessary to combine and analyze information coming from different sources: genetic-molecular, clinical data and environmental risks. This paper presents the work carried on by the INBIOMED research network within the field of biomedical image analysis. The overall objective is to respond to the growing demand of advanced information processing methods for: developing analysis tools, creating knowledge structure and validating them in pharmacogenetics, epidemiology, molecular and image based diagnosis research environments. All the image processing tools and data are integrated and work within a web services-based application, the so called INBIOMED platform. Finally, several biomedical research labs offered real data and validate the network tools and methods in the most prevalent pathologies: cancer, cardiovascular and neurological. This work provides a unique biomedical information processing platform, open to the incorporation of data coming from other feature disease networks.

David Pérez del Rey, José Crespo, Alberto Anguita, Juan Luis Pérez Ordóñez, Julián Dorado, Gloria Bueno, Vicente Feliú, Antonio Estruch, José Antonio Heredia

The Ontological Lens: Zooming in and out from Genomic to Clinical Level

Ontology is the talk of the day in the medical informatics community. Its relevant role in the design and implementation of information systems in health care is now widely acknowledged. In this paper we present two case studies showing ontologies “at work” in the genomic domain and in the clinical context. First we show how ontologies and genomic controlled vocabularies can be effectively applied to help in a genomic approach towards the comprehension of fundamental biological processes and complex cellular patho-physiological mechanisms, and hence in biological knowledge mining and discovery. Subsequently, as far as the clinical context is concerned, we emphasize the relevance of ontologies in order to maintain semantic consistency of patient data in a continuity of care scenario. In conclusion we advocate that a deep analysis of the structure and the concepts present at different granular level – from genes to organs – is needed in order to bridge this different domains and to unify bio-medical knowledge in a single paradigm.

Domenico M. Pisanelli, Francesco Pinciroli, Marco Masseroli

Data Analysis and Image Processing

Dynamics of Vertebral Column Observed by Stereovision and Recurrent Neural Network Model

A new non-invasive method for investigation of movement of selected points on the vertebral column is presented. The registration of position of points marked on patient’s body is performed by 4 infrared cameras. This experiment enables to reconstruct 3-dimensional trajectories of displacement of marked points. We introduce recurrent neural networks as formal nonlinear dynamical models of each point trajectory. These models are based only on experimental data and are set up of minimal number of parameters. Therefore they are suitable for pattern recognition problems.

C. Fernando Mugarra Gonzalez, Stanisław Jankowski, Jacek J. Dusza, Vicente Carrilero López, Javier M. Duart Clemente

Endocardial Tracking in Contrast Echocardiography Using Optical Flow

Myocardial Contrast Echocardiography (MCE) is a recent technique that allows to measure regional perfusion in the cardiac wall. Segmentation of MCE sequences would allow simultaneous evaluation of perfusion and wall motion. This paper deals with the application of partial differential equations (PDE) for tracking the endocardial wall. We use a variational optical flow method which we solve numerically with a multigrid approach adapted to the MCE modality. The data sequence are first smoothed and a hierarchical-iterative procedure is implemented to correctly estimate the flow field magnitude. The method is tested on several sequences showing promising results for automatic wall tracking.

Norberto Malpica, Juan F. Garamendi, Manuel Desco, Emanuele Schiavi

Unfolding of Virtual Endoscopy Using Ray-Template

Unfolding, one of virtual endoscopy techniques, gives us a flatten image of the inner surface of an organ. It is more suitable for a diagnosis and polyp detection. Most common unfolding methods use radial ray casting along with pre-computed central path. However, it may produce false images deformed and lost some information because adjacent ray planes cross when the organ’s curvature is relatively high. To solve it, several methods have been presented. However, these have severe computational overhead. We propose an efficient crossing-free ray casting for unfolding. It computes ray-cones according to curvature of the path. Then in order to avoid intersection between ray-cones, it adjusts direction of ray-cones detected while testing intersection. Lastly, it determines direction of all rays fired from sample points between control points by simple linear interpolation. Experimental results show that it produces accurate images of a virtually dissected colon and takes not much time.

Hye-Jin Lee, Sukhyun Lim, Byeong-Seok Shin

Knowledge Discovery and Data Mining

Integration of Genetic and Medical Information Through a Web Crawler System

The huge amount of information coming from genomics and proteomics research is expected to give rise to a new clinical practice, where diagnosis and treatments will be supported by information at the molecular level. However, navigating through bioinformatics databases can be a too complex and unproductive task.

In this paper we present an information retrieval engine that is being used to gather and join information about rare diseases, from the phenotype to the genotype, in a public web portal – diseasecard.org.

Gaspar Dias, José Luís Oliveira, Francisco-Javier Vicente, Fernando Martín-Sánchez

Vertical Integration of Bioinformatics Tools and Information Processing on Analysis Outcome

Biological sources integration has been addressed in several frameworks, considering both information sources incompatibilities and data representation heterogeneities. Most of these frameworks are mainly focused on coping with interoperability constraints among distributed databases that contain diverse types of biological data. In this paper, we propose an XML-based architecture that extends integration efforts from the distributed data sources domain to heterogeneous Bioinformatics tools of similar functionalities (“vertical integration”). The proposed architecture is based on the mediator/wrapper integration paradigm and a set of prescribed definitions that associates the capabilities and functional constraints of each analysis tool. The resulting XML-formatted information is further exploited by a visualization module that generates comparative views of the analysis outcome and a query mechanism that handles multiple information sources. The applicability of the proposed integration architecture and the information handling mechanisms was tested and substantiated on widely-known ab-initio gene finders that are publicly accessible through Web interfaces.

Andigoni Malousi, Vassilis Koutkias, Ioanna Chouvarda, Nicos Maglaveras

A Grid Infrastructure for Text Mining of Full Text Articles and Creation of a Knowledge Base of Gene Relations

We demonstrate the application of a grid infrastructure for conducting text mining over distributed data and computational resources. The approach is based on using LexiQuest Mine, a text mining workbench, in a grid computing environment. We describe our architecture and approach and provide an illustrative example of mining full-text journal articles to create a knowledge base of gene relations. The number of patterns found increased from 0.74 per full-text articles from a corpus of 1000 articles to 0.83 when the corpus contained 5000 articles. However, it was also shown that mining a corpus of 5000 full-text articles took 26 hours on a single computer, whilst the process was completed in less than 2.5 hours on a grid comprising of 20 computers. Thus whilst increasing the size of the corpus improved the efficiency of the text-mining process, a grid infrastructure was required to complete the task in a timely manner.

Jeyakumar Natarajan, Niranjan Mulay, Catherine DeSesa, Catherine J. Hack, Werner Dubitzky, Eric G. Bremer

Prediction of the Performance of Human Liver Cell Bioreactors by Donor Organ Data

Human liver cell bioreactors are used in extracorporeal liver support therapy. To optimize bioreactor operation with respect to clinical application an early prediction of the long-term bioreactor culture performance is of interest. Data from 70 liver cell bioreactor runs labeled by low (n=18), medium (n=34) and high (n=18) performance were analyzed by statistical and machine learning methods. 25 variables characterizing donor organ properties, organ preservation, cell isolation and cell inoculation prior to bioreactor operation were analyzed with respect to their importance to bioreactor performance prediction. Results obtained were compared and assessed with respect to their robustness. The inoculated volume of liver cells was found to be the most relevant variable allowing the prediction of low versus medium/high bioreactor performance with an accuracy of 84 %.

Wolfgang Schmidt-Heck, Katrin Zeilinger, Gesine Pless, Joerg C. Gerlach, Michael Pfaff, Reinhard Guthke

A Bioinformatic Approach to Epigenetic Susceptibility in Non-disjunctional Diseases

The aim of this work is to present a fully “in silico” approach for the identification of genes that might be involved in the susceptibility for non disjunction diseases and their regulation by methylation processes. We have carried out a strategy based on the use of online available bioinformatics databases and programs for the retrieval and identification of interesting genes. As result we have obtained 29 putative susceptibility genes regulated by methylation processes. We were neither on the need of developing new software nor carry out clinical laboratory experiments for the identification of these genes. We consider that this “in silico” methodology is robust enough to provide candidate genes that must be checked “in vivo” due to the clinical relevance of non disjunction diseases with the aim of providing new tools and criteria for their diagnostics.

Ismael Ejarque, Guillermo López-Campos, Michel Herranz, Francisco-Javier Vicente, Fernando Martín-Sánchez

Foreseeing Promising Bio-medical Findings for Effective Applications of Data Mining

The increasing availability of automated data collection tools, database technologies and Information and Communication Technologies in biomedicine and health care have led to huge amounts of biomedical and health-care data accumulated in several repositories. Unfortunately, the process of analysis of such data represents a complex task also because data volumes grow exponentially so manual analysis and interpretation become impractical. Fortunately, knowledge discovery in databases (KDD) and data mining (DM) are powerful tools available to medical and research people for help them in explore data and discover useful knowledge. To assess the spread of DM and KDD in biomedicine and health care, we designed and performed a search database of biomedical and health-care scientific literature, for the year interval 1997-2004, and analyzed the obtained results. There has been an increase of application of DM methods in literature of bio-medical informatics research most of which in bioinformatics and genomic area.

Stefano Bonacina, Marco Masseroli, Francesco Pinciroli

Statistical Methods and Tools for Biomedical Data Analysis

Hybridizing Sparse Component Analysis with Genetic Algorithms for Blind Source Separation

Nonnegative Matrix Factorization (NMF) has proven to be a useful tool for the analysis of nonnegative multivariate data. However, it is known not to lead to unique results when applied to nonnegative Blind Source Separation (BSS) problems. In this paper we present first results of an extension to the NMF algorithm which solves the BSS problem when the underlying sources are sufficiently sparse. As the proposed target function has many local minima, we use a genetic algorithm for its minimization.

Kurt Stadlthanner, Fabian J. Theis, Carlos G. Puntonet, Juan M. Górriz, Ana Maria Tomé, Elmar W. Lang

Hardware Approach to the Artificial Hand Control Algorithm Realization

The concept of the bioprosthesis control system implementation in the dedicated hardware is presented. The complete control algorithm was analysed and the decomposition revealing the parts which could be calculated concurrently was made. Specialized digital circuits providing the wavelet transform and the neural network calculations were designed and successfully verified. The experiment results show that the proposed solution provides the desired dexterity and agility of the artificial hand.

Andrzej R. Wolczowski, Przemyslaw M. Szecówka, Krzysztof Krysztoforski, Mateusz Kowalski

Improving the Therapeutic Performance of a Medical Bayesian Network Using Noisy Threshold Models

Treatment management in critically ill patients needs to be efficient, as delay in treatment may give rise to deterioration in the patient’s condition. Ventilator-associated pneumonia (VAP) occurs in patients who are mechanically ventilated in intensive care units. As it is quite difficult to diagnose and treat VAP, some form of computer-based decision support might be helpful. As diagnosing and treating disorders in medicine involves reasoning with uncertainty, we have used a Bayesian network as our primary tool for building a decision-support system for the clinical management of VAP. The effects of antibiotics on colonisation with various pathogens and subsequent antibiotic choices in case of VAP were modelled in the Bayesian network using the notion of causal independence. In particular, the conditional probability distribution of the random variable that represents the overall coverage of pathogens by antibiotics was modelled in terms of the conjunctive effect of the seven different pathogens, usually referred to as the

noisy-AND gate

. In this paper, we investigate generalisations of the noisy-AND, called

noisy threshold models

. It is shown that they offer a means for further improvement to the performance of the Bayesian network.

Stefan Visscher, Peter Lucas, Marc Bonten, Karin Schurink

SVM Detection of Premature Ectopic Excitations Based on Modified PCA

The paper presents a modified version of principal component analysis of 3-channel Holter recordings that enables to construct one SVM linear classifier for the selected group of patients with arrhythmias. Our classifier has perfect generalization properties. We studied the discrimination of premature ventricular excitation from normal ones. The high score of correct classification (95%) is due to the orientation of the system of coordinates along the largest eigenvector of the normal heart action of every patient under study.

Stanisław Jankowski, Jacek J. Dusza, Mariusz Wierzbowski, Artur Oręziak

Decision Support Systems

A Text Corpora-Based Estimation of the Familiarity of Health Terminology

In a pilot effort to improve health communication we created a method for measuring the familiarity of various medical terms. To obtain term familiarity data, we recruited 21 volunteers who agreed to take medical terminology quizzes containing 68 terms. We then created predictive models for familiarity based on term occurrence in text corpora and reader’s demographics. Although the sample size was small, our preliminary results indicate that predicting the familiarity of medical terms based on an analysis of the frequency in text corpora is feasible. Further, individualized familiarity assessment is feasible when demographic features are included as predictors.

Qing Zeng, Eunjung Kim, Jon Crowell, Tony Tse

On Sample Size and Classification Accuracy: A Performance Comparison

We investigate the dependency between sample size and classification accuracy of three classification techniques: Naïve Bayes, Support Vector Machines and Decision Trees over a set of 8500 text excerpts extracted automatically from narrative reports from the Brigham & Women’s Hospital, Boston, USA. Each excerpt refers to the smoking status of a patient as: current, past, never a smoker or, denies smoking. Our empirical results, consistent with [1], confirm that size of the training set and the classification rate are indeed correlated. Even though these algorithms perform reasonably well with small datasets, as the number of cases increases, both SMV and Decision Trees show a substantial improvement in performance, suggesting a more consistent learning process. Unlike the majority of evaluations, ours were carried out specifically in a medical domain where the limited amount of data is a common occurrence [13][14]. This study is part of the I2B2 project, Core 2.

Margarita Sordo, Qing Zeng

Influenza Forecast: Comparison of Case-Based Reasoning and Statistical Methods

Influenza is the last of the classic plagues of the past, which still has to be brought under control. It causes a lot of costs: prolonged stays in hospitals and especially many days of unfitness for work. Therefore many of the most developed countries have started to create influenza surveillance systems. Mostly statistical methods are applied to predict influenza epidemics. However, the results are rather moderate, because influenza waves occur in irregular cycles. We have developed a method that combines Case-Based Reasoning with temporal abstraction. Here we compare experimental results of our method and statistical methods.

Tina Waligora, Rainer Schmidt

Tumor Classification from Gene Expression Data: A Coding-Based Multiclass Learning Approach

The effectiveness of cancer treatment depends strongly on an accurate diagnosis. In this paper we propose a system for automatic and precise diagnosis of a tumor’s origin based on genetic data. This system is based on a combination of coding theory techniques and machine learning algorithms. In particular, tumor classification is described as a multiclass learning setup, where gene expression values serve the system to distinguish between types of tumors. Since multiclass learning is intrinsically complex, the data is divided into several biclass problems whose results are combined with an error correcting linear block code. The robustness of the prediction is increased as errors of the base binary classifiers are corrected by the linear code. Promising results have been achieved with a best case precision of 72% when the system was tested on real data from cancer patients.

Alexander Hüntemann, José C. González, Elizabeth Tapia

Boosted Decision Trees for Diagnosis Type of Hypertension

The inductive learning algorithms are the very attractive methods generating hierarchical classifiers. They generate hypothesis of the target concept on the base on the set of labeled examples. This paper presents some of the decision tree induction methods, boosting concept and their usefulness for diagnosis of the type of hypertension (essential hypertension and five type of secondary one: fibroplastic renal artery stenosis, atheromatous renal artery stenosis, Conn’s syndrome, renal cystic disease and pheochromocystoma). The decision on the type of hypertension is made only on base on blood pressure, general information and basis biochemical data.

Michal Wozniak

Markov Chains Pattern Recognition Approach Applied to the Medical Diagnosis Tasks

In many medical decision problems there exist dependencies between subsequent diagnosis of the same patient. Among the different concepts and methods of using “contextual” information in pattern recognition, the approach through Bayes compound decision theory is both attractive and efficient from the theoretical and practical point of view. Paper presents the probabilistic approach (based on expert rules and learning set) to the problem of recognition of state of acid-base balance and to the problem of computer-aided anti-hypertension drug therapy. The quality of obtained classifier are compared to the frquencies of correct classification of three neural nets.

Michal Wozniak

Computer-Aided Sequential Diagnosis Using Fuzzy Relations – Comparative Analysis of Methods

A specific feature of the explored diagnosis task is the dependence between patient’s states at particular instants, which should be taken into account in sequential diagnosis algorithms. In this paper methods for performing sequential diagnosis using fuzzy relation in product of diagnoses set and fuzzified feature space are developed and evaluated. In the proposed method first on the base of learning set fuzzy relation is determined as a solution of appropriate optimization problem and next this relation in the form of matrix of membership grade values is used at successive instants of sequential diagnosis process. Different algorithms of sequential diagnosis which differ with as well the sets of input data as procedure are described. Proposed algorithms were practically applied to the computer-aided recognition of patient’s acid-base equilibrium states where as an optimization procedure genetic algorithm was used. Results of comparative experimental analysis of investigated algorithms in respect of classification accuracy are also presented and discussed.

Marek Kurzynski, Andrzej Zolnierek

Collaborative Systems in Biomedical Informatics

Service Oriented Architecture for Biomedical Collaborative Research

Following a systems engineering approach we have identified the information system requirements for biomedical collaborative research. We have designed a Service Oriented Architecture following a dynamic and adaptable to change approach, using technology and specifications that are being developed in an open way, utilizing industry partnerships and broad consortia such as W3C and the Organization for the Advancement of Structured Information Standards (OASIS), and based on standards and technology that are the foundation of the Internet. The design has been translated in a pilot implementation infrastructure (INBIOMED) that is now been populated with web services for data and images analysis and collaborative management.

José Antonio Heredia, Antonio Estruch, Oscar Coltell, David Pérez del Rey, Guillermo de la Calle, Juan Pedro Sánchez, Ferran Sanz

Simultaneous Scheduling of Replication and Computation for Bioinformatic Applications on the Grid

One of the first motivations of using grids comes from applications managing large data sets like for example in High Energy Physic or Life Sciences. To improve the global throughput of software environments, replicas are usually put at wisely selected sites. Moreover, computation requests have to be scheduled among the available resources. To get the best performance, scheduling and data replication have to be tightly coupled which is not always the case in existing approaches.

This paper presents an algorithm that combines data management and scheduling at the same time using a steady-state approach. Our theoretical results are validated using simulation and logs from a large life science application (ACI GRID GriPPS). The PattInProt application searches sites and signatures of proteins into databanks of protein sequences.

Frédéric Desprez, Antoine Vernois, Christophe Blanchet

The INFOBIOMED Network of Excellence: Developments for Facilitating Training and Mobility

Enhancing training and mobility in the area of Biomedical Informatics (BMI) is one of the most important objectives of the European Network of Excellence INFOBIOMED. Based on the lessons learned from previous decades of experiences in teaching Medical Informatics and Bioinformatics, an action plan has been elaborated. This plan is structured into three actions: (a) a survey to analyze and evaluate the situation, needs and expectations in BMI. (b) A Biomedical Informatics course database (ICD) containing the relevant keywords in the area, and (c) the design and implementation of a Mobility Brokerage Service (MBS) to enhance mobility and exchanges in the area. This paper describes the overall approach and technical characteristics of the MBS. It follows an innovative service-oriented architecture based on Web Services, providing distributed access to on-line information sources. This approach is being evaluated and reused for different research applications within the Network.

Guillermo de la Calle, Mario Benito, Juan Luis Moreno, Eva Molero

Bioinformatics: Computational Models

Using Treemaps to Visualize Phylogenetic Trees

Over recent years the field of phylogenetics has witnessed significant algorithmic and technical progress. A new class of efficient phylogeny programs allows for computation of large evolutionary trees comprising 500–1.000 organisms within a couple of hours on a single CPU under elaborate optimization criteria. However, it is difficult to extract the valuable information contained in those large trees without appropriate visualization tools. As potential solution we propose the application of treemaps to visualize large phylogenies (evolutionary trees) and improve knowledge-retrieval. In addition, we propose a hybrid tree/treemap representation which provides a detailed view of subtrees via treemaps while maintaining a contextual view of the entire topology at the same time. Moreover, we demonstrate how it can be deployed to visualize an evolutionary tree comprising 2.415 mammals. The respective software package is available on-line at www.ics.forth.gr/~stamatak.

Adam Arvelakis, Martin Reczko, Alexandros Stamatakis, Alkiviadis Symeonidis, Ioannis G. Tollis

An Ontological Approach to Represent Molecular Structure Information

Current approaches using Artificial Intelligence techniques applied to chemistry use representations inherited from existing tools. These tools describe chemical compounds with a set of structure-activity relationship (SAR) descriptors because they were developed mainly for the task of drug design. We propose an ontology based on the chemical nomenclature as a way to capture the concepts commonly used by chemists in describing molecular structure of the compounds. In this paper we formally specify the concepts and relationships of the chemical nomenclature in a comprehensive ontology using a form of relational representation called

feature terms

. We also provide several examples of describing chemical compounds using this ontology and compare our proposal with other SAR based approaches.

Eva Armengol, Enric Plaza

Focal Activity in Simulated LQT2 Models at Rapid Ventricular Pacing: Analysis of Cardiac Electrical Activity Using Grid-Based Computation

This study investigated the involvement of ventricular focal activity and dispersion of repolarization in LQT2 models at rapid rates. The Luo-Rudy dynamic model was used to simulate ventricular tissues. LQT2 syndrome due to genetic mutations was modeled by modifying the conductances of delayed rectifier potassium currents. Cellular automata was employed to generate virtual tissues coupled with midmyocardial (M) cell clusters. Simulations were conducted using grid-based computation. Under LQT2 conditions, early after-depolarizations (EADs) occurred first at the border of the M refractory zone in epicardium coupled with M clusters, but spiked off from endocardial cells in endocardium coupled with M clusters. The waveform of EADs was affected by the topological distribution of M clusters. Our results explain why subepicardial and subendocardial cells could exhibit surprisingly EADs when adjacent to M cells and suggest that phase 2 EADs are responsible for the onset of Torsade de Pointes at rapid ventricular pacing.

Chong Wang, Antje Krause, Chris Nugent, Werner Dubitzky

Bioinformatics: Structural Analysis

Extracting Molecular Diversity Between Populations Through Sequence Alignments

The use of sequence alignments for establishing protein homology relationships has an extensive tradition in the field of bioinformatics, and there is an increasing desire for more statistical methods in the data analysis. We present statistical methods and algorithms that are useful when the protein alignments can be divided into two or more populations based on known features or traits. The algorithms are considered valuable for discovering differences between populations at a molecular level. The approach is illustrated with examples from real biological data sets, and we present experimental results in applying our work on bacterial populations of

Vibrio

, where the populations are defined by optimal growth temperature,

T

opt

.

Steinar Thorvaldsen, Tor Flå, Nils P. Willassen

Detection of Hydrophobic Clusters in Molecular Dynamics Protein Unfolding Simulations Using Association Rules

One way of exploring protein unfolding events associated with the development of Amyloid diseases is through the use of multiple Molecular Dynamics Protein Unfolding Simulations. The analysis of the huge amount of data generated in these simulations is not a trivial task. In the present report, we demonstrate the use of Association Rules applied to the analysis of the variation profiles of the Solvent Accessible Surface Area of the 127 amino-acid residues of the protein Transthyretin, along multiple simulations. This allowed us to identify a set of 28 hydrophobic residues forming a hydrophobic cluster that might be essential in the unfolding and folding processes of Transthyretin.

Paulo J. Azevedo, Cândida G. Silva, J. Rui Rodrigues, Nuno Loureiro-Ferreira, Rui M. M. Brito

Protein Secondary Structure Classifiers Fusion Using OWA

The combination of classifiers has been proposed as a method to improve the accuracy achieved by a single classifier. In this study, the performances of optimistic and pessimistic ordered weighted averaging operators for protein secondary structure classifiers fusion have been investigated. Each secondary structure classifier outputs a unique structure for each input residue. We used confusion matrix of each secondary structure classifier as a general reusable pattern for converting this unique label to measurement level. The results of optimistic and pessimistic OWA operators have been compared with majority voting and five common classifiers used in the fusion process. Using a benchmark set from the EVA server, the results showed a significant improvement in the average Q3 prediction accuracy up to 1.69% toward the best classifier results.

Majid Kazemian, Behzad Moshiri, Hamid Nikbakht, Caro Lucas

Efficient Computation of Fitness Function by Pruning in Hydrophobic-Hydrophilic Model

The use of Genetic Algorithms in a 2D Hydrophobic-Hydrophilic (HP) model in protein folding prediction application requires frequent fitness function computations. While the fitness computation is linear, the overhead incurred is significant with respect to the protein folding prediction problem. Any reduction in the computational cost will therefore assist in more efficiently searching the enormous solution space for protein folding prediction. This paper proposes a novel pruning strategy that exploits the inherent properties of the HP model and guarantee reduction of the computational complexity during an ordered traversal of the amino acid chain sequences for fitness computation, truncating the sequence by at least one residue.

Md. Tamjidul Hoque, Madhu Chetty, Laurence S. Dooley

Evaluation of Fuzzy Measures in Profile Hidden Markov Models for Protein Sequences

In biological problems such as protein sequence family identification and profile building the additive hypothesis of the probability measure is not well suited for modeling HMM based profiles because of a high degree of interdependency among homologous sequences of the same family . Fuzzy measure theory which is an extension of the classical additive theory is obtained by replacing the additive requirement of classical measures with weaker properties of monotonicity, continuity and semi-continuity. The strong correlations and the sequence preference involved in the protein structures make fuzzy measure architecture based models as suitable candidates for building profiles of a given family since fuzzy measures can handle uncertainties better than classical methods . In this paper we investigate the different measures(S-decomposable,

λ

and belief measures) of fuzzy measure theory for building profile models of protein sequence problems. The proposed fuzzy measure models have been tested on globin and kinase families . The results obtained from the fuzzy measure models establish the superiority of fuzzy measure theory compared to classical probability measures for biological sequence problems.

Niranjan P. Bidargaddi, Madhu Chetty, Joarder Kamruzzaman

Bioinformatics: Microarray Data Analysis

Relevance, Redundancy and Differential Prioritization in Feature Selection for Multiclass Gene Expression Data

The large number of genes in microarray data makes feature selection techniques more crucial than ever. From various ranking-based filter procedures to classifier-based wrapper techniques, many studies have devised their own flavor of feature selection techniques. Only a handful of the studies delved into the effect of redundancy in the predictor set on classification accuracy, and even fewer on the effect of varying the importance between relevance and redundancy. We present a filter-based feature selection technique which incorporates the three elements of relevance, redundancy and differential prioritization. With the aid of differential prioritization, our feature selection technique is capable of achieving better accuracies than those of previous studies, while using fewer genes in the predictor set. At the same time, the pitfalls of over-optimistic estimates of accuracy are avoided through the use of a more realistic evaluation procedure than the internal leave-one-out-cross-validation.

Chia Huey Ooi, Madhu Chetty, Shyh Wei Teng

Gene Selection and Classification of Human Lymphoma from Microarray Data

Experiments in DNA microarray provide information of thousands of genes, and bioinformatics researchers have analyzed them with various machine learning techniques to diagnose diseases. Recently Support Vector Machines (SVM) have been demonstrated as an effective tool in analyzing microarray data. Previous work involving SVM used every gene in the microarray to classify normal and malignant lymphoid tissue. This paper shows that, using gene selection techniques that selected only 10% of the genes in “Lymphochip” (a DNA microarray developed at Stanford University School of Medicine), a classification accuracy of about 98% is achieved which is a comparable performance to using every gene. This paper thus demonstrates the usefulness of feature selection techniques in conjunction with SVM to improve its performance in analyzing Lymphochip microarray data. The improved performance was evident in terms of better accuracy, ROC (receiver operating characteristics) analysis and faster training. Using the subsets of Lymphochip, this paper then compared the performance of SVM against two other well-known classifiers: multi-layer perceptron (MLP) and linear discriminant analysis (LDA). Experimental results show that SVM outperforms the other two classifiers.

Joarder Kamruzzaman, Suryani Lim, Iqbal Gondal, Rezaul Begg

Microarray Data Analysis and Management in Colorectal Cancer

The availability of microarray technologies has enabled biomedical researchers to explore expression levels of a complete genome simultaneously. The analysis of gene expression patterns can explain the biological basis of several pathological processes. Deepening in the understanding of the molecular processes underlying colorectal cancer might become of interest for the advance of its clinical management. This work presents the analysis of microarrays data using colon cancer samples in order to determine the differentially expressed genes underlying this disease process. The comparison of gene expression levels using a complete genome approach of tumor samples versus healthy controls allows the definition of a set of genes involved in the differentiation of both tissues. The analysis of these differentially expressed genes using Gene Ontology analysis permits the location of most prevalent processes that are altered during under this disease.

Oscar García-Hernández, Guillermo López-Campos, Juan Pedro Sánchez, Rosa Blanco, Alejandro Romera-Lopez, Beatriz Perez-Villamil, Fernando Martín-Sánchez

Springer Professional

Table of Contents

Frontmatter