Course Details

  • Date: March 8th, 2016 - March 9th, 2016
  • Time: 9:30 am - 4:30 pm
  • Location: NIH Bldg 10 FAES and NCI-Frederick Bldg 549 Library
  • Presenter(s): David Wheeler, PhD. (Laboratory of Biochemistry and Molecular Biology, CCR, NCI), Devendra Mistry (QIAGEN), Justin Lack (NIAID CBR), Maggie Cam (NCI CCBR), Michael Ryan (Johns Hopkins/InSilico Solutions/MDACC)

BTEP Workshop on Exome-Seq Data Analysis and Variant Annotation (2-day)

This workshop will cover the basics and best practices of exome-seq analysis including downstream interpretation of variants using a variety of in-house, open-source and commercial web tools (CCBR Exome-Seq Pipeliner, AVIA, Ingenuity Variant Analysis, and CRAVAT/MuPit).

Please note that this workshop will be remotely telecast to the Library Training Room in Bldg 549 at NCI-Frederick for attendees who select to register at that location.

Dates: March 8-9, 2016 (Tuesday and Wednesday)

Time: 9:30 am – 12:30 pm and 1:30 – 4:30 pm (both days)


Live Workshop – NIH Bethesda – Bldg 10, FAES Room 6

Remote Simulcast – Scientific Library Training Room, Bldg 549, NCI-Frederick

For more information on the Frederick simulcast, please contact:

Tracie Frederick,
Technology Informationist,
Scientific Library, NCI at Frederick
Phone: 301-846-1094


Day 1 – Tuesday, March 8, 2016

9:30 am to 12:30 pm –  Introductory Lectures

Chunhua Yan, PhD –  Primer on Next Generation and Exome Sequencing

This will be an introduction to NGS in general and Exome-Seq in particular, covering:

    •    Next generation sequencing technology
    •    Exome sequencing (Cost, Speed, Gene coverage, Biological implication)
    •    Experimental design (Sample size, Coverage, Sample submission)
    •    Mutation Calling (Dream challenge, Genome in Bottle)

Justin Lack, PhD – Exome-Seq Data Analysis Pipeline: From Reads to Results

This talk will provide an overview of the exome-seq pipeline work-flow with recommended best practices.

Some of the topics covered will be:
    •    Read quality trimming and adapter clipping,
    •    Initial QC and read mapping challenges,
    •    Impact and removal of PCR duplicates,
    •    Local realignment around indels, quality recalibration,
    •    Germline variant calling in the Haplotype Caller,
    •    Somatic variant detection using MuTect,
    •    All-in-one annotator (AVIA), and
    •    Example of a down-stream analysis of a tumor/germline comparison data set.

12:30 – 1:30 pm LUNCH BREAK

1:30-4:30 pm – Open-Source Software Tools for Analysis of Exome-Seq Data

  • David Wheeler, PhD: Brief Introduction to the graphical user interface (GUI) of the CCBR Exome­-Seq Pipeliner

This presentation will introduce the concept of pipelines in general and touch on the robustness and reproducibility of pipelines, as well as parallel execution, tracking of inputs/outputs and reports from pipelines. There will be a  brief discussion on the modular features of Snakemake as a segue into the Pipeliner program and a pipeline definition. This will be followed by a demonstration of the pipeline and its availability for use of exome-seq data analysis.

  • Hue Vuong, PhD: Introduction and Tutorial on AVIA

The Annotation, Visualization, and Impact Analysis application, AVIA (, is an interactive web-based annotation server used to explore and interpret large sets of single nucleotide polymorphisms (SNPs) and small insertion/deletions (indels).  Along with assigning gene impact of genomic variants, AVIA helps users to perform custom annotations of the variant from various disparate data sources, such as SIFT, Polyphen2, TargetScan, nonB, etc.   Using AVIA, users will be able to annotate files, filter variants, and view gene level annotations for their variants.  Users may upload VCF4, BED, CLC bio variant files in text or compressed formats ( zip, gz, tar).  Users can also  explore gene level effects from PharmGKB, Drug Bank, GO Ontology, DAVID, etc.  In this hands-on workshop, users will learn how to submit a set of variants to AVIA and how to navigate the results page to find variants of possible significance.

  • Maggie Cam, PhD: Integrative Genomics Viewer (IGV) Tutorial

The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations. The hands-on tutorial will walk attendees on how to visualizing variant (VCF) and alignment (BAM) files using IGV.

Day 2 – Wednesday, March 9, 2016

9:30 am – 12:30 pm – Commercial Software for Exome-Seq Data: QIAGEN’s Ingenuity Variant Analysis

 Devendra (Dev) Mistry, Field Application Specialist

Ingenuity Variant Analysis (IVA) combines analytical tools and integrated content to help you rapidly identify and prioritize variants by drilling down to a small, targeted subset of compelling variants based both upon published biological evidence and your own knowledge of disease biology. This workshop will focus on how the users can upload their datasets, efficiently use different filters within variant analysis to identify causal variants, export data and will also go over the recent IVA updates. With IVA, you can interrogate your variants from multiple biological perspectives, explore different biological hypotheses, and identify the most promising variants for follow-up.

12:30 – 1:30 pm LUNCH BREAK

1:30-4:30 pm  CRAVAT/MuPIT: Academic Open-Source Tool for Analysis of Genomic Variants 

Michael Ryan, PhD

CRAVAT ( is a free tool for high-throughput analysis of human sequencing variants developed by the Karchin lab at Johns Hopkins and In Silico Solutions.  CRAVAT accepts very large variant data files containing single nucleotide substitutions as well as indels and returns a wide variety of annotations and scores that help with identification of important variants.  The workshop will provide some background on CRAVAT and lots of hands-on exercises to learn how to use the tool and interpret the results.

MuPIT (mupit.icm.jhu) is a sister tool to CRAVAT that shows human mutations on 3D protein structures.  The focus of the workshop will be a series of exercises to learn how to visualize your mutations in MuPIT, how CRAVAT and MuPIT are integrated, and how to manipulate, investigate, and understand the results.