Course Details

  • Date: October 3rd, 2016 - October 4th, 2016
  • Time: 9:30 am - 4:00 pm
  • Location: NIH Bldg 10, FAES Room 4 (B1C205)
  • Presenter(s): Maggie Cam (NCI CCBR), Parthav Jailwala (CCBR)
The maximum number of registrations (25) allowed has been reached for this workshop. You will be put on the waitlist and informed if any cancellations do occur. You are welcome to come on the morning of the workshop and be seated on a first-come, first-serve basis, if there are any last-minute dropouts.

Learn the fundamentals about microarray technology and designing experiments for your research using this technology. After attending this 2-day workshop, you will understand the best practices to perform analysis of gene expression data from microarrays, and know how do that using GEO2R (open source tool) and Partek Genomics Suite (NCI-licensed commercial tool). As we walk though hands-on analysis of a cancer dataset, you will learn the principles of experimental design, batch correction, statistics, and how to extract biological meaning from the results using multiple sets of tools and applications. A significant portion of the final session on the second day is dedicated to allow attendees to independently work on a publicly available dataset and implement the analysis workflow using the knowledge gained over the course of the workshop.

PLEASE NOTE: This workshop is a BYOC (Bring Your Own LapTop Computer) class, and requires installation of Partek Genomics Suite on your laptop ahead of Day  2 of the workshop. Government issued or personal computers are permitted. We will be able to supply a very limited set of computers, so if you want to take the class but cannot bring your own computer, please indicate such in the Comment section on the registration form.

Workshop Agenda

Day 1 – October 3, 2016 (Monday)

9:30 am – 10 am                 Welcome and Workshop Overview (Anand S. Merchant, Ph.D. – CCBR)

10:00 am – 12:30 pm           Introductory Lecture: Microarray Technology and Data Analysis (Maggie Cam, PhD – CCR)


  • Historical Perspective
  • Microarray Technologies, Sample Processing Methods
  • Microarray comparisons to RNA-Seq

Data Analysis

  • Experimental Design
  • QC methods
  • Preprocessing: Normalization and low level analysis algorithms

Statistical Analysis

  • Common statistical models used for analysis of microarray data
  • Examples of blocking
  • Batch effects and removal methods

11:15 – 11:30 am BREAK

Visualization and Clustering

  • Volcano Plot
  • Principal Components Analysis
  • Hierarchical Clustering
  • K-means Clustering

Validation and Downstream Analysis

  • Validation methods
  • Major Software applications
  • Public Repositories of Microarray Data
  • Gene Ontology Enrichment and Pathway analysis tools

12:30 – 1:30 pm LUNCH BREAK

1:30 – 2:45 pm                  GEO2R: Open-source web tool for querying publicly available datasets (Parthav Jailwala, MSc- CCBR)

GEO2R is an interactive web tool that allows users to compare two or more groups of samples in a GEO Series in order to identify genes that are differentially expressed across experimental conditions. GEO2R performs comparisons on original submitter-supplied processed data tables using the GEOquery and limma R packages from the Bioconductor project. Bioconductor is an open source software project based on the R programming language that provides tools for the analysis of high-throughput genomic data. The GEOquery R package parses GEO data into R data structures that can be used by other R packages. The limma (Linear Models for Microarray Analysis) R package has emerged as one of the most widely used statistical tests for identifying differentially expressed genes. It handles a wide range of experimental designs and data types and applies multiple-testing corrections on P-values to help correct for the occurrence of false positives. Thus, GEO2R provides a simple interface that allows users to perform R statistical analysis without command line expertise.

Lecture and Hands-on session for GEO2R

  • Background on GEO datasets
  • What is GEO2R and how can it help you
  • How to use GEO2R
  • Options and features
  • Limitations and caveat
  • Hands-on exercise

2:45 – 3:00 pm – BREAK

3:00 – 4:00 pm                  CCBR Microarray Data Analysis Pipeline Demonstration (Fathi Elloumi, Ph.D., CCBR)

Attendees will learn about the complete workflow implemented by CCBR for microarray data analysis. The speaker will demonstrate the practical steps based on the theoretical concepts discussed in the morning, and will discuss topics that include:

  • Objectives of and requirements for pipeline
  • Input data types, files and format
  • Initial quality checks of the data
  • Visuals of ‘good’ and ‘bad’ data
  • Differential expression analysis
  • Statistical parameters
  • Downstream enrichment analysis
  • Planned future enhancements

Day 2 – October 4, 2016 (Tuesday)

9:30 am – 12:30 pm          Gene Expression Workflow with Partek Genomics Suite (Eric Seiser, PhD – FAS, Partek)

The training will include a guided analysis of an Affymetrix gene expression data set to showcase and familiarize users with the Gene Expression analysis workflow covering the topics listed below.  Following this analysis, attendees will be presented with the task of obtaining a data set from the NCBI Gene Expression Omnibus (GEO) and running an independent analysis of the data to attempt to replicate the findings of the publication.  Attendees will be given a list of analysis goals and will have the opportunity to ask for help from the instructor as they work through this analysis.  The goal of this hands on time will be to provide attendees with experience analyzing real world data independently.         

  • Import data – Affymetrix CEL files
  • Perform QA/QC of imported data
  • Exploratory data analysis – Principal Component Analysis (PCA)
  • Detect differential expression (ANOVA) – two factor analysis
  • Gene list creation (Venn diagram creation and list overlap)
  • Visualization (PCA, histogram, box plot, dot plot, volcano plot, heatmap etc.)       

12:30 – 1:30 pm LUNCH BREAK      

1:30 – 4:00 pm           Hands-on Gene Expression Data Analysis with PGS – Continued

  • Additional features (1:30 – 2:30 pm)
    • Biological interpretation – through use of Gene Ontology and KEGG pathways
    • Integration with other data – combining gene and miRNA expression data
    • Other topics – Batch Effect Removal, Survival analysis
  • Independent hands-on exercise (with GEO dataset) for attendees (2:30 – 4:00 pm)
    • Download and Import data
    • Use PCA to identify factors for statistical modeling
    • Identify deferentially expressed genes
    • Generate a heatmap
    • Find important pathways