Skip to main content
Top

2013 | Book

Basics of Bioinformatics

Lecture Notes of the Graduate Summer School on Bioinformatics of China

Editors: Rui Jiang, Xuegong Zhang, Michael Q. Zhang

Publisher: Springer Berlin Heidelberg

insite
SEARCH

About this book

This book outlines 11 courses and 15 research topics in bioinformatics, based on curriculums and talks in a graduate summer school on bioinformatics that was held in Tsinghua University. The courses include: Basics for Bioinformatics, Basic Statistics for Bioinformatics, Topics in Computational Genomics, Statistical Methods in Bioinformatics, Algorithms in Computational Biology, Multivariate Statistical Methods in Bioinformatics Research, Association Analysis for Human Diseases: Methods and Examples, Data Mining and Knowledge Discovery Methods with Case Examples, Applied Bioinformatics Tools, Foundations for the Study of Structure and Function of Proteins, Computational Systems Biology Approaches for Deciphering Traditional Chinese Medicine, and Advanced Topics in Bioinformatics and Computational Biology. This book can serve as not only a primer for beginners in bioinformatics, but also a highly summarized yet systematic reference book for researchers in this field.

Rui Jiang and Xuegong Zhang are both professors at the Department of Automation, Tsinghua University, China. Professor Michael Q. Zhang works at the Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.

Table of Contents

Frontmatter
Chapter 1. Basics for Bioinformatics
Abstract
Bioinformatics has become a hot research topic in recent years, a hot topic in several disciplines that were not so closely linked with biology previously. A side evidence of this is the fact that the 2007 Graduate Summer School on Bioinformatics of China had received more than 800 applications from graduate students from all over the nation and from a wide collection of disciplines in biological sciences, mathematics and statistics, automation and electrical engineering, computer science and engineering, medical sciences, environmental sciences, and even social sciences. So what is bioinformatics?
Xuegong Zhang, Xueya Zhou, Xiaowo Wang
Chapter 2. Basic Statistics for Bioinformatics
Abstract
Statistics is a branch of mathematics that targets on the collection, organization, and interpretation of numerical data, especially on the analysis of population characteristics by inferences from random sampling. Many research topics in computational biology and bioinformatics heavily rely on the application of probabilistic models and statistical methods. It is therefore necessary for students in bioinformatics programs to take introductory statistics as their first course. In this chapter, the basics of statistics are introduced from the following clue: foundations of statistics, point estimation, hypothesis testing, interval estimation, analysis of variance (ANOVA), and regression models. Besides, the free and open-source statistical computing environment R is also briefly introduced. Students can refer to the book by George Casella and Roger L. Berger [1] and other related textbooks for further studies.
Yuanlie Lin, Rui Jiang
Chapter 3. Topics in Computational Genomics
Abstract
Genomics began with large-scale sequencing of the human and many model organism genomes around 1990; rapid accumulation of vast genomic data brings a great challenge on how to decipher such massive molecular information. As bioinformatics in general, genome informatics is also data driven; many computational tools developed can soon be obsolete when new technologies and data types become available. Keeping this in mind if a student wants to work in this fascinating new field, one must be able to adapt quickly and to “shoot the moving targets” with the “just-in-time ammunition.”
Michael Q. Zhang, Andrew D. Smith
Chapter 4. Statistical Methods in Bioinformatics
Abstract
The linear biopolymers, DNA, RNA, and proteins, are the three central molecular building blocks of life. DNA is an information storage molecule. All of the hereditary information of an individual organism is contained in its genome, which consists of sequences of the four DNA bases (nucleotides), A, T, C, and G. RNA has a wide variety of roles, including a small but important set of functions. Proteins, which are chains of 20 different amino acid residues, are the action molecules of life, being responsible for nearly all the functions of all living beings and forming many of life’s structures. All protein sequences are coded by segments of the genome called genes. The universal genetic doe is used to translate triplets of DNA bases, called codons, to the 20-letter alphabet of proteins. How genetic information flows from DNA to RNA and then to protein is regarded as the central dogma of molecular biology. Genome sequencing projects with emergences of microarray techniques have resulted in rapidly growing and publicly available databases of DNA and protein sequences, structures, and genome-wide expression. One of the most interesting questions scientists are concerned with is how to get any useful biological information from “mining” these databases.
Jun S. Liu, Bo Jiang
Chapter 5. Algorithms in Computational Biology
Abstract
What is an algorithm? An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required output for any legitimate input in a finite amount of time. Figure 5.1 gives illustrative description of the relation between problem, algorithm and, the input and output of an algorithm.
Tao Jiang, Jianxing Feng
Chapter 6. Multivariate Statistical Methods in Bioinformatics Research
Abstract
In bioinformatics research, data sets usually contain tens of thousands of variables, such as different genes and proteins. Statistical methods for analyzing such multivariate data sets are important. In this chapter, multivariate statistical methods will be reviewed, and some challenges in bioinformatic research will be also discussed.
Lingsong Zhang, Xihong Lin
Chapter 7. Association Analysis for Human Diseases: Methods and Examples
Abstract
Many biologists see statistics and statistical analysis of their data more as a nuisance than a necessary tool. After all, they are convinced their data are correct and demonstrate what the experiment was supposed to show. But this view is precisely why we need statistics—researchers tend to see their results in too positive a light. Statistical analysis, if done properly, provides an unbiased assessment of the outcome of an experiment. Here are some additional reasons for using statistics.
Jurg Ott, Qingrun Zhang
Chapter 8. Data Mining and Knowledge Discovery Methods with Case Examples
Abstract
This chapter deals with the area of knowledge discovery and data mining that has emerged as an important research direction for extracting useful information from vast repositories of data of various types. The basic concepts, problems, and challenges are first briefly discussed. Some of the major data mining tasks like classification, clustering, and association rule mining are then described in some detail. This is followed by a description of some tools that are frequently used for data mining. Two case examples of supervised and unsupervised classification for satellite image analysis are presented. Finally, an extensive bibliography is provided.
S. Bandyopadphyay, U. Maulik
Chapter 9. Applied Bioinformatics Tools
Abstract
A hands-on course mainly for the applications of bioinformatics to biological problems was organized at Peking University. The course materials are from http://abc.cbi.pku.edu.cn. They are divided into individual pages (separated by lines in the text):
Jingchu Luo
Chapter 10. Foundations for the Study of Structure and Function of Proteins
Abstract
Proteins are the most abundant biological macromolecules, occurring in all cells and all parts of cells. Moreover, proteins exhibit enormous diversity of biological function and are the most final products of the information pathways. Protein is a major component of protoplasm, which is the basis of life. It is translated from RNA and composed of amino acid connected by peptide bonds. It participates in a series of complicated chemical reactions and finally leads to the phenomena of life. So we can say it is the workhorse molecule and a major player of life activity. Biologists focus on the diction of structure and function of proteins by the study of the primary, secondary, tertiary, and quaternary dimensional structures of proteins, posttranscriptional modifications, protein-protein interactions, the DNA-proteins interactions, and so on.
Zhirong Sun
Chapter 11. Computational Systems Biology Approaches for Deciphering Traditional Chinese Medicine
Abstract
Traditional Chinese medicine (TCM) is a system with its own rich tradition over 3,000 years. Compared to Western medicine (WM), TCM is holistic with emphasis on regulating the integrity of the human body. However, understanding TCM in the context of “system” and TCM modernization both remain to be problematic. Along with the “Omics” revolution, it comes the era of system biology (SB). After years of studies in the cross field of TCM and SB, we find that bioinformatics and computational systems biology (CSB) approaches may help for deciphering the scientific basis of TCM. And the previous difficulty in the direct combination of WM and TCM, two distinct medical systems, may also be overcome through the development of systems biology, which tends toward preventive, predictive, and personalized medicine [1].
Shao Li, Le Lu
Chapter 12. Advanced Topics in Bioinformatics and Computational Biology
Abstract
Phylogeny defined as the context of evolutionary biology is the connections between all groups of organisms as understood by ancestor/descendant relationships. Since many groups of organisms are now extinct, we can’t have as clear a picture of how modern life is interrelated without their fossils. Phylogenetics, the science of phylogeny, is a useful tool severed as one part of the larger field of systematic including taxonomy which is a practice and science of naming and classifying the diversity of organisms.
Bailin Hao, Chunting Zhang, Yixue Li, Hao Li, Liping Wei, Minoru Kanehisa, Luhua Lai, Runsheng Chen, Nikolaus Rajewsky, Michael Q. Zhang, Jingdong Han, Rui Jiang, Xuegong Zhang, Yanda Li
Erratum
Xuegong Zhang, Xueya Zhou, Xiaowo Wang
Metadata
Title
Basics of Bioinformatics
Editors
Rui Jiang
Xuegong Zhang
Michael Q. Zhang
Copyright Year
2013
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-38951-1
Print ISBN
978-3-642-38950-4
DOI
https://doi.org/10.1007/978-3-642-38951-1

Premium Partner