### Course Details

• Date: October 22nd, 2015 - October 23rd, 2015
• Time: 9:30 am - 4:30 pm
• Location: FAES Room 3 – B1C207
• Presenter(s): David Wheeler, PhD. (Laboratory of Biochemistry and Molecular Biology, CCR, NCI), Sean Davis (CCR, NCI)

# A Short Course in R for Biologists

“A Short Course in R for Biologists” is a two-day course given in four three-hour sessions entitled: Introduction to R, Introduction to Bioconductor, Introduction to Microarray Analysis, and Introduction to NGS Data Analysis.

Day Morning Session, 9:30 AM-12:30 PM Afternoon Session, 1:30 PM-4:30 PM
Oct 22 Introduction to R Introduction to Bioconductor
Oct 23 Introduction to Microarray Analysis Introduction to NGS Data Analysis

PLEASE NOTE: This 2 day workshop is a BYOC (Bring your own laptop Computer) class. Government issued or personal computers are permitted. We will be able to supply a very limited set of computers, so if you want to take the class but cannot bring your own computer please indicate such in the Comment section on the registration form.

The course will include frequent, short hands-on periods so students should bring their own laptops with a working installation of R, version 3.1 or later. In addition, several R packages will be used which must be installed prior to the course.

R is a console application. Students who prefer a more graphically-oriented working environment will find that using RStudio as an environment in which to run R makes life much easier. If you are comfortable running programs, viewing output, and editing files at the terminal, you will not need RStudio in order to take the course. However, RStudio offers quite an array of functions that you may still find useful and it is well worth a look.

#### R Installation

The R program and instructions for its installation under Linux, Mac OSX, and Windows can be found here:

http://cran.r-project.org/

#### Bioconductor and Bioconductor Package Installation

Complete instructions for the installation of the basic and additional Bioconductor packages are found here:

http://www.bioconductor.org/install/

In addition to the basic Bioconductor package, please install these additional Bioconductor packages prior to the start of the class:

 Biostrings BSgenome BSgenome.Celegans.UCSC.ce6 TxDb.Celegans.UCSC.ce6.ensGene GenomicFeatures GenomicRanges GenomicAlignments TxDb.Hsapiens.UCSC.hg19.knownGene affy simpleaffy arrayQualityMetrics limma survival ggplot2 hthgu133acdf hthgu133a.db gplots

Briefly, the following code, executed from within an R session, should serve to install the basic Bioconductor package as well as the additional packages listed above:

# First, download the Bioconductor installer, biocLite()

source("http://bioconductor.org/biocLite.R")

# Now, use the installer to install several packages at once
# The base package, Biobase, will be installed automatically

biocLite(pkgs=c("Biostrings", "BSgenome", "BSgenome.Celegans.UCSC.ce6", "TxDb.Celegans.UCSC.ce6.ensGene", "GenomicFeatures", "GenomicRanges", "GenomicAlignments", "TxDb.Hsapiens.UCSC.hg19.knownGene","affy","simpleaffy","arrayQualityMetrics","limma","survival","ggplot2","hthgu133acdf","hthgu133a.db","gplots"))

#### RStudio Installation

Install the “Desktop, Open Source Edition”:

http://www.rstudio.com/products/RStudio/#Desk

## Class Outline

### Day 1 (Oct 22), Morning Session: Introduction to R

• The R environment
• Starting an R Session, Setting Options
• Listing Variables, Editing Commands, Using the R History
• Getting Help on an R Function
• Logging a Session to a File
• Running External R Code
• Ending a Session, Saving Your Work
• The Elements of R
• Numeric
• Character
• Logical
• Missing Values
• R Data Structures
• Vectors
• Matrices
• Lists
• Data.Frames
• Factors
• Functions
• Other Complex Structures
• Procedures
• Exploring and Summarizing Data
• Dealing with Missing Data
• Restructuring Data
• Relabeling Data
• Subsetting Data
• Operating on Rows or Columns of Data
• Saving R Objects for Later Use
• Graphing Data
• Simple Statistical Tests
• Example: A Simple Analysis of Probe Intensity Data
• Project: Creating a Graphical Function in 4 Easy Steps
• Step 1: Create an X-Y Plot to Compare Two Arrays
• Step 2: Package the X-Y Plot as a Function
• Step 3: Create a Median Array as a Better Standard for Comparison
• Step 4: Rotate and Scale the Plot-Voila, You Have Created a MAPlot!

### Day 1 (Oct 22), Afternoon Session: Introduction to Bioconductor

• Installing Bioconductor
• An Overview of Bioconductor Packages
• Fundamental Packages
• Biobase: the Foundation
• Biostrings: A Representation of Biological Sequences
• BSgenome: A Representation of Complete Genomic Sequences
• GenomicRanges: Manipulation of Genomic Intervals
• GenomicFeatures: Manipulation of Genomic Features
• GenomicAlgnments: Manipulation of Short Genomic Alignments
• Two Fundamental Structures to Contain Experiment Data
• The ExpressionSet for Array Data
• Constructing an ExpressionSet
• Analyzing an ExpressionSet
• The SummarizedExperiment for NGS Sequence Data
• Constructing a SummarizedExperiment
• Analyzing a SummarizedExperiment

### Day 2 (Oct 23), Morning Session: Introduction to Microarray Analysis

The objective of this session is to initiate students in the analysis of microarrays using R and Bioconductor. To better help students take advantage of the microarray services offered by the Laboratory of Molecular Technology at NCI-Frederick, the focus of the course will be on the analysis of data from Affymetrix chips. It is assumed that the student has some knowledge of microarray workflows.