Course Details

  • Date: December 20th, 2016 - December 20th, 2016
  • Time: 9:30 am - 4:00 pm
  • Location: NIH Bldg 10, FAES Room 4 (B1C205)
  • Presenter(s): David Wheeler, PhD. (Laboratory of Biochemistry and Molecular Biology, CCR, NCI)


R is a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. It is a console application that compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. It is available for download through CRAN, which is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R. Bioconductor uses the R statistical programming language, and is open source and open development as well. It provides tools for the analysis and comprehension of high-throughput genomic data. The course will include multiple, short hands-on exercises spread out throughout the two lecture sessions.

PLEASE NOTE: This 1-day workshop is a BYOC (Bring your own Laptop Computer) class. Government issued or personal computers are permitted. We will be able to supply a very limited set of computers, so if you want to take the class but cannot bring your own computer please indicate such in the Comment section on the registration form.

REQUIRED INSTALLATION:  Students who bring their own laptops should ensure that R v3.3.1 and Bioconductor v3.4, is installed on their computers. In addition, several R packages (listed below) will be used which must be installed prior to the course. Please follow the instructions mentioned further below complete installation of these packages required for the workshop.

R Installation

The R program and instructions for its installation can be found by clicking the link provided below.

Please choose the correct environment – Linux, Mac OSX, or Windows – that is applicable to your system.

Bioconductor and Bioconductor Package Installation

Complete instructions for the installation of the basic and additional Bioconductor packages are found here:

In addition to the basic Bioconductor package, please install these additional Bioconductor packages prior to the start of the class:

Biostrings BSgenome BSgenome.Celegans.UCSC.ce6
TxDb.Celegans.UCSC.ce6.ensGene GenomicFeatures GenomicRanges
GenomicAlignments TxDb.Hsapiens.UCSC.hg19.knownGene  

Command-line instructions for Bioconductor and packages: The following code, executed from within an R session, should serve to install the basic Bioconductor package as well as the additional packages listed above.

# First, download the Bioconductor installer, biocLite()


# Now, use the installer to install several packages at once
# The base package, Biobase, will be installed automatically

biocLite(pkgs=c("Biostrings", "BSgenome", "BSgenome.Celegans.UCSC.ce6", "TxDb.Celegans.UCSC.ce6.ensGene", "GenomicFeatures", "GenomicRanges", "GenomicAlignments", "TxDb.Hsapiens.UCSC.hg19.knownGene"))

RStudio Installation (not required for workshop, but some users may find it useful)

Students who prefer a more graphically-oriented working environment will find that using RStudio as an environment in which to run R makes life much easier.  It offers quite an array of functions that you may still find useful and it is well worth a look. Install the “€œDesktop, Open Source Edition”€:

Workshop Agenda

Morning Session – 9:30 am -12:30 pm

Introduction to R

  • The R environment
    • Starting an R Session, Setting Options
    • Listing Variables, Editing Commands, Using the R History
    • Getting Help on an R Function
    • Logging a Session to a File
    • Running External R Code
    • Installing and Loading Packages
    • Ending a Session, Saving Your Work
  • The Elements of R
    • Numeric
    • Character
    • Logical
    • Missing Values
  • R Data Structures
    • Vectors
    • Matrices
    • Lists
    • Data.Frames
    • Factors
    • Functions
    • Other Complex Structures
  • Procedures
    • Reading and Writing Data
    • Exploring and Summarizing Data
    • Dealing with Missing Data
    • Restructuring Data
    • Relabeling Data
    • Subsetting Data
    • Operating on Rows or Columns of Data
    • Saving R Objects for Later Use
    • Graphing Data
    • Simple Statistical Tests
    • Example: A Simple Analysis of Probe Intensity Data
  • Project: Creating a Graphical Function in 4 Easy Steps
    • Step 1: Create a Heatmap of Gene Expression Data
    • Step 2: Package Heatmap as a Function
    • Step 3: Add some Custom Formatting
    • Step 4: Save for Future Use and – Voila, You Have Created your own Heatmap Library!

12:30 – 1:00 pm   LUNCH BREAK

Afternoon Session – 1:00 – 4:00 pm

Introduction to Bioconductor

  • Installing Bioconductor
  • An Overview of Bioconductor Packages
  • Fundamental Packages
    • Biobase: the Foundation
    • Biostrings: A Representation of Biological Sequences
    • BSgenome: A Representation of Complete Genomic Sequences
    • GenomicRanges: Manipulation of Genomic Intervals
    • GenomicFeatures: Manipulation of Genomic Features
    • GenomicAlgnments: Manipulation of Short Genomic Alignments
  • Two Fundamental Structures to Contain Experiment Data
    • The ExpressionSet for Array Data
      • Constructing an ExpressionSet
      • Analyzing an ExpressionSet
    • The SummarizedExperiment for NGS Sequence Data
      • Constructing a SummarizedExperiment
      • Analyzing a SummarizedExperiment