Course Details

  • Date: January 29th, 2015 - January 30th, 2015
  • Time: 9:30 am - 4:30 pm
  • Location: FAES Classroom 4
  • Presenter(s): David Wheeler, PhD. (Laboratory of Biochemistry and Molecular Biology, CCR, NCI), Sean Davis (CCR, NCI)



A Short Course in R for Biologists

“A Short Course in R for Biologists” is a two-day course given in four three-hour sessions entitled: Introduction to R, Introduction to Bioconductor, Introduction to Microarray Analysis, and Introduction to NGS Data Analysis.

Day Morning Session, 9:30 AM-12:30 PM Afternoon Session, 1:30 PM-4:30 PM
Jan 29 Introduction to R Introduction to Bioconductor
Jan 30 Introduction to Microarray Analysis Introduction to NGS Data Analysis

Registration Required

Web-based resources for this class: (See Below for PDF versions)

The course will include frequent, short hands-on periods so students should bring their own laptops with a working installation of R, version 3.1 or later. In addition, several R packages will be used which must be installed prior to the course.

R is a console application. Students who prefer a more graphically-oriented working environment will find that using RStudio as an environment in which to run R makes life much easier. If you are comfortable running programs, viewing output, and editing files at the terminal, you will not need RStudio in order to take the course. However, RStudio offers quite an array of functions that you may still find useful and it is well worth a look.

R Installation

The R program and instructions for its installation under Linux, Mac OSX, and Windows can be found here:

http://cran.r-project.org/

Bioconductor and Bioconductor Package Installation

Complete instructions for the installation of the basic and additional Bioconductor packages are found here:

http://www.bioconductor.org/install/

In addition to the basic Bioconductor package, please install these additional Bioconductor packages prior to the start of the class:

Biostrings BSgenome BSgenome.Celegans.UCSC.ce6
TxDb.Celegans.UCSC.ce6.ensGene GenomicFeatures GenomicRanges
GenomicAlignments TxDb.Hsapiens.UCSC.hg19.knownGene affy
simpleaffy arrayQualityMetrics limma
survival ggplot2 hthgu133acdf
hthgu133a.db gplots

Briefly, the following code, executed from within an R session, should serve to install the basic Bioconductor package as well as the additional packages listed above:

# First, download the Bioconductor installer, biocLite()

source("http://bioconductor.org/biocLite.R")

# Now, use the installer to install several packages at once
# The base package, Biobase, will be installed automatically

biocLite(pkgs=c("Biostrings", "BSgenome", "BSgenome.Celegans.UCSC.ce6", "TxDb.Celegans.UCSC.ce6.ensGene", "GenomicFeatures", "GenomicRanges", "GenomicAlignments", "TxDb.Hsapiens.UCSC.hg19.knownGene","affy","simpleaffy","arrayQualityMetrics","limma","survival","ggplot2","hthgu133acdf","hthgu133a.db","gplots"))

RStudio Installation

Install the “€œDesktop, Open Source Edition”€:

http://www.rstudio.com/products/RStudio/#Desk

Class Outline

Day 1 (Jan 29), Morning Session: Introduction to R

  • The R environment
    • Starting an R Session, Setting Options
    • Listing Variables, Editing Commands, Using the R History
    • Getting Help on an R Function
    • Logging a Session to a File
    • Running External R Code
    • Installing and Loading Packages
    • Ending a Session, Saving Your Work
  • The Elements of R
    • Numeric
    • Character
    • Logical
    • Missing Values
  • R Data Structures
    • Vectors
    • Matrices
    • Lists
    • Data.Frames
    • Factors
    • Functions
    • Other Complex Structures
  • Procedures
    • Reading and Writing Data
    • Exploring and Summarizing Data
    • Dealing with Missing Data
    • Restructuring Data
    • Relabeling Data
    • Subsetting Data
    • Operating on Rows or Columns of Data
    • Saving R Objects for Later Use
    • Graphing Data
    • Simple Statistical Tests
    • Example: A Simple Analysis of Probe Intensity Data
  • Project: Creating a Graphical Function in 4 Easy Steps
    • Step 1: Create an X-Y Plot to Compare Two Arrays
    • Step 2: Package the X-Y Plot as a Function
    • Step 3: Create a Median Array as a Better Standard for Comparison
    • Step 4: Rotate and Scale the Plot-€“Voila, You Have Created a MAPlot!

Day 1 (Jan 29), Afternoon Session: Introduction to Bioconductor

  • Installing Bioconductor
  • An Overview of Bioconductor Packages
  • Fundamental Packages
    • Biobase: the Foundation
    • Biostrings: A Representation of Biological Sequences
    • BSgenome: A Representation of Complete Genomic Sequences
    • GenomicRanges: Manipulation of Genomic Intervals
    • GenomicFeatures: Manipulation of Genomic Features
    • GenomicAlgnments: Manipulation of Short Genomic Alignments
  • Two Fundamental Structures to Contain Experiment Data
    • The ExpressionSet for Array Data
      • Constructing an ExpressionSet
      • Analyzing an ExpressionSet
    • The SummarizedExperiment for NGS Sequence Data
      • Constructing a SummarizedExperiment
      • Analyzing a SummarizedExperiment

Day 2 (Jan 30), Morning Session: Introduction to Microarray Analysis

The objective of this session is to initiate students in the analysis of microarrays using R and Bioconductor. To better help students take advantage of the microarray services offered by the Laboratory of Molecular Technology at NCI-Frederick, the focus of the course will be on the analysis of data from Affymetrix chips. It is assumed that the student has some knowledge of microarray workflows.

  • Downloading Data from The Cancer Genome Atlas Databases
  • Preliminary Steps: Array Pre-Processing
    • Checking the Quality of Arrays
    • Performing Array Normalization
  • Identifying Differentially Expressed Genes
  • Data Visualization
    • Performing Principal Component Analysis (PCA)
    • Computing and Interpreting Heatmaps
    • Computing and Interpreting Kaplan Meir Curves

Day 2 (Jan 30), Afternoon Session: Introduction to NGS Data Analysis

Details to be announced