Course Details

  • Date: November 9th, 2015 - November 10th, 2015
  • Time: 9:30 am - 4:30 pm
  • Location: FAES - Classroom 7/6
  • Presenter(s): David Wheeler, PhD. (Laboratory of Biochemistry and Molecular Biology, CCR, NCI), Sean Davis (CCR, NCI)

A Short Course in R for Biologists

“A Short Course in R for Biologists” is a two-day course given in four three-hour sessions entitled: Introduction to R, Introduction to Bioconductor, Introduction to Microarray Analysis, and Introduction to NGS Data Analysis.

Day Morning Session, 9:30 AM-12:30 PM Afternoon Session, 1:30 PM-4:30 PM
Nov 9 Introduction to R Introduction to Bioconductor
Nov 10 Introduction to Microarray Analysis Introduction to NGS Data Analysis

Registration Required

PLEASE NOTE: This 2 day workshop is a BYOC (Bring your own laptop Computer) class. Government issued or personal computers are permitted. We will be able to supply a very limited set of computers, so if you want to take the class but cannot bring your own computer please indicate such in the Comment section on the registration form.

Web-based resources for this class: (See Below for PDF versions)

The course will include frequent, short hands-on periods so students should bring their own laptops with a working installation of R, version 3.1 or later. In addition, several R packages will be used which must be installed prior to the course.

R is a console application. Students who prefer a more graphically-oriented working environment will find that using RStudio as an environment in which to run R makes life much easier. If you are comfortable running programs, viewing output, and editing files at the terminal, you will not need RStudio in order to take the course. However, RStudio offers quite an array of functions that you may still find useful and it is well worth a look.

R Installation

The R program and instructions for its installation under Linux, Mac OSX, and Windows can be found here:

http://cran.r-project.org/

Bioconductor and Bioconductor Package Installation

Complete instructions for the installation of the basic and additional Bioconductor packages are found here:

http://www.bioconductor.org/install/

In addition to the basic Bioconductor package, please install these additional Bioconductor packages prior to the start of the class:

Biostrings BSgenome BSgenome.Celegans.UCSC.ce6
TxDb.Celegans.UCSC.ce6.ensGene GenomicFeatures GenomicRanges
GenomicAlignments TxDb.Hsapiens.UCSC.hg19.knownGene affy
simpleaffy arrayQualityMetrics limma
survival ggplot2 hthgu133acdf
hthgu133a.db gplots

Briefly, the following code, executed from within an R session, should serve to install the basic Bioconductor package as well as the additional packages listed above:

# First, download the Bioconductor installer, biocLite()

source("http://bioconductor.org/biocLite.R")

# Now, use the installer to install several packages at once
# The base package, Biobase, will be installed automatically

biocLite(pkgs=c("Biostrings", "BSgenome", "BSgenome.Celegans.UCSC.ce6", "TxDb.Celegans.UCSC.ce6.ensGene", "GenomicFeatures", "GenomicRanges", "GenomicAlignments", "TxDb.Hsapiens.UCSC.hg19.knownGene","affy","simpleaffy","arrayQualityMetrics","limma","survival","ggplot2","hthgu133acdf","hthgu133a.db","gplots"))

RStudio Installation

Install the “€œDesktop, Open Source Edition”€:

http://www.rstudio.com/products/RStudio/#Desk

Class Outline

Day 1 (Nov 9), Morning Session: Introduction to R

  • The R environment
    • Starting an R Session, Setting Options
    • Listing Variables, Editing Commands, Using the R History
    • Getting Help on an R Function
    • Logging a Session to a File
    • Running External R Code
    • Installing and Loading Packages
    • Ending a Session, Saving Your Work
  • The Elements of R
    • Numeric
    • Character
    • Logical
    • Missing Values
  • R Data Structures
    • Vectors
    • Matrices
    • Lists
    • Data.Frames
    • Factors
    • Functions
    • Other Complex Structures
  • Procedures
    • Reading and Writing Data
    • Exploring and Summarizing Data
    • Dealing with Missing Data
    • Restructuring Data
    • Relabeling Data
    • Subsetting Data
    • Operating on Rows or Columns of Data
    • Saving R Objects for Later Use
    • Graphing Data
    • Simple Statistical Tests
    • Example: A Simple Analysis of Probe Intensity Data
  • Project: Creating a Graphical Function in 4 Easy Steps
    • Step 1: Create an X-Y Plot to Compare Two Arrays
    • Step 2: Package the X-Y Plot as a Function
    • Step 3: Create a Median Array as a Better Standard for Comparison
    • Step 4: Rotate and Scale the Plot-€“Voila, You Have Created a MAPlot!

Day 1 (Nov 9), Afternoon Session: Introduction to Bioconductor

  • Installing Bioconductor
  • An Overview of Bioconductor Packages
  • Fundamental Packages
    • Biobase: the Foundation
    • Biostrings: A Representation of Biological Sequences
    • BSgenome: A Representation of Complete Genomic Sequences
    • GenomicRanges: Manipulation of Genomic Intervals
    • GenomicFeatures: Manipulation of Genomic Features
    • GenomicAlgnments: Manipulation of Short Genomic Alignments
  • Two Fundamental Structures to Contain Experiment Data
    • The ExpressionSet for Array Data
      • Constructing an ExpressionSet
      • Analyzing an ExpressionSet
    • The SummarizedExperiment for NGS Sequence Data
      • Constructing a SummarizedExperiment
      • Analyzing a SummarizedExperiment

Day 2 (Nov 10), Morning Session: Introduction to Microarray Analysis

The objective of this session is to initiate students in the analysis of microarrays using R and Bioconductor. To better help students take advantage of the microarray services offered by the Laboratory of Molecular Technology at NCI-Frederick, the focus of the course will be on the analysis of data from Affymetrix chips. It is assumed that the student has some knowledge of microarray workflows.

  • Downloading Data from The Cancer Genome Atlas Databases
  • Preliminary Steps: Array Pre-Processing
    • Checking the Quality of Arrays
    • Performing Array Normalization
  • Identifying Differentially Expressed Genes
  • Data Visualization
    • Performing Principal Component Analysis (PCA)
    • Computing and Interpreting Heatmaps
    • Computing and Interpreting Kaplan Meir Curves

Day 2 (Nov 10), Afternoon Session: Introduction to NGS Data Analysis

Details to be announced