nach oben

2008 | Buch

Kapitel lesen Erstes Kapitel lesen

Bioconductor Case Studies

verfasst von: Florian Hahne, Wolfgang Huber, Robert Gentleman, Seth Falcon

Verlag: Springer New York

Buchreihe : Use R!

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Bioconductor software has become a standard tool for the analysis and comprehension of data from high-throughput genomics experiments. Its application spans a broad field of technologies used in contemporary molecular biology. In this volume, the authors present a collection of cases to apply Bioconductor tools in the analysis of microarray gene expression data. Topics covered include: (1) import and preprocessing of data from various sources; (2) statistical modeling of differential gene expression; (3) biological metadata; (4) application of graphs and graph rendering; (5) machine learning for clustering and classification problems; (6) gene set enrichment analysis.

Each chapter of this book describes an analysis of real data using hands-on example driven approaches. Short exercises help in the learning process and invite more advanced considerations of key topics. The book is a dynamic document. All the code shown can be executed on a local computer, and readers are able to reproduce every computation, figure, and table.

Inhaltsverzeichnis

Frontmatter

1. The ALL Dataset

Abstract

In this initial chapter we briefly describe the typical data preprocessing steps for a sample dataset that will be used in many of the following exercises.

F. Hahne, R. Gentleman

2. R and Bioconductor Introduction

Abstract

In this chapter we cover basic uses of R and begin working with Bioconductor datasets and tools. Topics covered include simple R programming, R graphics, and working with environments as hash tables. We introduce the ExpressionSet class as an example for a basic Bioconductor structure used for holding genomic data, in this case expression microarray data. And we explore some visualization techniques for gene expression data to get a feeling for R’s extensive graphical capabilities.

R. Gentleman, F. Hahne, S. Falcon, M. Morgan

3. Processing Affymetrix Expression Data

Abstract

In this chapter we do an analysis of Affymetrix gene expression data. We begin with the CEL files that contain the raw microarray readings, discuss how to do quality assessment, and proceed to normalization and the estimation of expression values. Finally, we determine differentially expressed genes.

R. Gentleman, W. Huber

4. Two Color Arrays

Abstract

In this case study, two RNA samples are compared to each other on 60 mer oligonucleotide microarrays using two-color labeling. The lab covers data import, visualization, exploration and normalization of the data, and the identification of differentially expressed genes.

Florian Hahne, Wolfgang Huber

5. Fold Changes, Log Ratios, Background Correction, Shrinkage Estimation, and Variance Stabilization

Abstract

Microarray data are affected by experimental variability, which is a combination of systematic and stochastic variability. The basic task of microarray preprocessing is to extract quantities of interest from the data while correcting for systematic variations and controlling the stochastic variability. In this exercise we explore the concepts of (log-)ratios, the role of background correction, the idea of shrinkage estimation, and the generalized logarithm. Some tools for this are provided by the package vsn.

W. Huber

6. Easy Differential Expression

Abstract

In this short exercise, we explore the most basic approach to the selection of differentially expressed genes between two classes: first, a nonspecific filtering step to remove probes for genes that appear to be always unexpressed or at least not differentially expressed. Second, a probe-by-probe statistical test, and third, multiple testing correction to get an attenuated test statistic through the false discovery rate (FDR). There are many variations and improvements to the procedure shown here, and you can learn more about these in Chapter 7.

F. Hahne, W. Huber

7. Differential Expression

Abstract

In this chapter we will cover some of the basic principles of finding differentially expressed genes. We cover nonspecific filtering, multiple testing, the moderated test statistics provided by the limma package, and gene selection by ROC curves.

W. Huber, D. Scholtens, F. Hahne, A. von Heydebreck

8. Annotation and Metadata

Abstract

In this chapter we demonstrate the use of Bioconductor metadata resources. After having obtained a list of reporters from a microarray experiment and mapping them to their target genes, one will want to use the annotation of the genes and gene products to better interpret the experimental results. Often, it is beneficial to use gene annotation in the course of the primary analysis, in order to narrow down the set of data to be considered and ameliorate multiple testing problems, or in order to explore specific biological hypotheses.

W. Huber, F. Hahne

9. Supervised Machine Learning

Abstract

In this chapter we cover some of the basic principles of supervised machine learning. We mainly consider the two-class problem, but also cover some multiclass prediction.We introduce some of the basic concepts in machine learning such as the distance function, the socalled confusion matrix , and cross-validation. We make extensive use of the MLInterfaces package.

R. Gentleman, W. Huber, V. J. Carey

10. Unsupervised Machine Learning

Abstract

In this chapter we explore the use of unsupervised machine learning, or clustering. We cover distances, dimension reduction techniques, and a variety of unsupervised machine learning methods including hierarchical clustering, k-means clustering, and specialized methods, such as those in the hopach package.

R. Gentleman, V. J. Carey

11. Using Graphs for Interactome Data

Abstract

Many data types and many models of biological systems are best described in terms of graphs. Protein–protein interaction data are a prominent example. In this chapter, we explore a curated dataset of protein interactions and perform a statistical analysis of the relationship between protein interaction and coexpression. We also show how to access large-scale protein–protein interaction datasets from the IntAct repository at the EBI.

T. Chiang, S. Falcon, F. Hahne, W. Huber

12. Graph Layout

Abstract

In this chapter we demonstrate how to lay out and render graphs using tools from the Rgraphviz and graph packages.

F. Hahne, W. Huber, R. Gentleman

13. Gene Set Enrichment Analysis

Abstract

Gene Set Enrichment Analysis (GSEA) is an important method for analyzing gene expression data. It is useful for finding biological themes in gene sets, and it can help to increase the statistical power of analyses by aggregating the signal across groups of related genes. In this chapter, we introduce tools available in the Category and GSEABase packages for carrying out gene set enrichment analysis.

R. Gentleman, M. Morgan, W. Huber

14. Hypergeometric Testing Used for Gene Set Enrichment Analysis

Abstract

After the set of interesting genes has been determined, say those that are differentially expressed, a next step in the analysis is to attempt to find functional relationships among those genes that might help better elucidate the underlying biology. These methods typically rely on existing or predefined sets of genes. In this chapter we show how to carry out Hypergeometric tests to identify potentially interesting gene sets.

S. Falcon, R. Gentleman

15. Solutions to Exercises

Abstract

sessionInfo prints version information about R and all loaded packages. This is helpful when posting on one of the R or Bioconductor mailing lists in order to provide detailed information about the software you are using.

Florian Hahne, Wolfgang Huber, Robert Gentleman, Seth Falcon

Backmatter

Titel: Bioconductor Case Studies
verfasst von: Florian Hahne
Wolfgang Huber
Robert Gentleman
Seth Falcon
Verlag: Springer New York
Electronic ISBN: 978-0-387-77240-0
Print ISBN: 978-0-387-77239-4
DOI: https://doi.org/10.1007/978-0-387-77240-0