Course Details

  • Date: September 22nd, 2015 - September 23rd, 2015
  • Time: 9:30 am - 4:30 pm
  • Location: Bldg10: FAES Classroom 7 ( B1C206)
  • Presenter(s): Maggie Cam (NCI CCBR), Parthav Jailwala (CCBR), Xiaowen Wang (Partek)

Learn the basics of microarray gene expression analysis using Partek Genomics Suite and Open Source Tools. As we walk though hands-on analysis of a cancer dataset, you will learn the principles of experimental design, batch correction, statistics, and how to extract biological meaning from the results using tools geneset analyses and pathways.

PLEASE NOTE: This 2 day workshop is  a BYOC (Bring your own LapTop Computer) class. Government issued or personal computers are permitted. We will be able to supply a very limited set of computers, so if you want to take the class but cannot bring your own computer please indicate such in the Comment section on the registration form.

 Direction of FAES Classroom 7 (B1C206)    can be found here:

Day 1 – AM (9:30-11:30)  Introductory Lecture
(Maggie Cam, PhD – CCR, NCI)


  • Historical Perspective
  • Microarray Technologies, Sample Processing Methods
  • Microarray comparisons to RNA-Seq

Data Analysis

  • Experimental Design
  • QC methods
  • Preprocessing: Normalization and low level analysis algorithms

Statistical Analysis

  • Common statistical models used for analysis of microarray data
  • Examples of blocking
  • Batch effects and removal methods

Visualization and Clustering

  • Volcano Plot
  • Principal Components Analysis
  • Hierarchical Clustering
  • K-means Clustering

Validation and Downstream Analysis

  • Validation methods
  • Gene Ontology Enrichment and Pathway analysis tools
  • Major Software applications
  • Public Repositories of Microarray Data


Day 1 – PM (2:00-4:30 pm):  Hands-on  Gene Expression Data Analysis in Partek Genomics Suite
(Xiaowen Wang, PhD – Partek)

Attendees will learn how to use basic features of Partek Genomics Suite for the analysis on Gene Expression Data. An Affymetrix Gene Expression Data will be used to conduct Gene Expression workflow:

  • Import data
  • Perform QA/QC of imported data
  • Exploratory data analysis
  • Detect differential expression (ANOVA)
  • Gene list creation

Day 2 – AM (9:30-11:30):  Hands-on  Gene Expression Data Analysis in Partek Genomics Suite – Continued 
(Xiaowen Wang, PhD – Partek)

  • Biological interpretation
  • Visualization (PCA, histogram, box plot, dot plot, volcano plot, interaction plot heatmap etc.)

Day 2 – PM (1:30-2:30): GEO2R
(Parthav Jailwala, MSc- CCBR, NCI)

GEO2R is an interactive web tool that allows users to compare two or more groups of samples in a GEO Series in order to identify genes that are differentially expressed across experimental conditions. GEO2R performs comparisons on original submitter-supplied processed data tables using the GEOquery and limma R packages from the Bioconductor project. Bioconductor is an open source software project based on the R programming language that provides tools for the analysis of high-throughput genomic data. The GEOquery R package parses GEO data into R data structures that can be used by other R packages. The limma (Linear Models for Microarray Analysis) R package has emerged as one of the most widely used statistical tests for identifying differentially expressed genes. It handles a wide range of experimental designs and data types and applies multiple-testing corrections on P-values to help correct for the occurrence of false positives. Thus, GEO2R provides a simple interface that allows users to perform R statistical analysis without command line expertise.


  • Background on GEO datasets
  • What is GEO2R and how can it help you
  • How to use GEO2R
  • Options and features
  • Limitations and caveats
  • Hands-on exercise

Day 2 – PM (2:30-3:30): DAVID
(David/Dawei Huang, M.D. – LMB, CCR, NCI)

The Database for Annotation, Visualization and Integrated Discovery (DAVID ) provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes.


Day 2 – PM (3:30-4:30): GeneSet Enrichment Analysis (GSEA)

(Maggie Cam, PhD – CCR, NCI)

GSEA is a computational method that determines which (if any) a priori defined sets of genes are   significantly differentially expressed, as an ensemble, between two biological states.  It is an open-source program developed by the Broad Institute:


  • The general approach of gene set enrichment methods and comparison with DAVID
  • How GSEA measures differential expression for each set of genes
  • Controlling effects of multiple comparisons in GSEA (false discovery rate)
  • The Broad Institute library of groups of gene sets (MSigDB)
  • What files and formats are needed for GSEA
  • User options and running GSEA


  • Loading the GSEA required input files for an example dataset
  • Using and choosing values in the GSEA GUI interface
  • Rank-based analysis
  • Full dataset analysis
  • Understanding the GSEA outputs and judging significance in the results